Chairs and Mugs

-- A Dataset for Object-Centric Scene Understanding
and Equivariance

Jiahui Lei1        Congyue Deng2        Karl Schmeckpeper1        Leonidas Guibas2        Kostas Daniilidis1


This dataset provides 3D scenes with repetitive objects from the same categories (chairs, mugs) under a variety of scene configurations, from the simplest case with all objects standing upwards on a ground plane, to the most challenging cases with diverse object poses and complex background contents. Such scenarios are common in many highly interactive real-world environments. And this dataset encourages the development of scene understanding methods that are:
  • object-centric, leveraging category-level information on repeating instances
  • robust to scene configuration changes
  • generalizable to unseen or even out-of-distribution scene configurations
The dataset comprises two subsets: a synthetic set for developing, training, and validating scene understanding methods, and a small real-world scan set for evaluations with the sim2real domain gap.

Dataset details

Synthetic scenes: Our synthetic dataset is simulated with SAPIEN. For the synthetic tabletop scenes, we place 4 synthetic depth cameras at the 4 corners of a table and place the objects in a bin at the center of the table, which is a common setup for tabletop manipulators. We simulate realistic IR sensor depth patterns with IR ray tracing and the mesh reconstruction is created by integrating 4 view depths via TSDF fusion. For the chair scenes, we use 8 static cameras with ideal depth (instead of IR ray tracing), because unlike tabletop scenes, real-world indoor scenes are usually captured with continuous scans which will result in smoother and better reconstruction.
Real scans: Our real dataset contains 240 reconstructions of real scenes containing challenging configurations and backgrounds. More data are collected for scenes with more complex configurations or that are harder to create in simulation environments.
 Mugs Z  10   Mugs SO3  10   Mugs Pile  10 
 Mugs Tree  50   Mugs Others  50   Mugs Wild  50 
 Chairs Z  20   Chairs SO3  20   Chairs Pile  20 


EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision
PDF | Code

Data Samples

Synthetic scenes

  • Mugs Z

  • Mugs SO3

  • Mugs Pile

  • Mugs Tree

  • Mugs Box

  • Mugs Shelf

Real scans

  • Mugs Z

  • Mugs SO3

  • Mugs Pile

  • Mugs Tree

  • Mugs Wild

  • Mugs Others

  • Chairs Z

  • Chairs SO3

  • Chairs Pile


To download the synthetic scenes click here
To download the real scans click here


If you use the dataset or code please cite:
title={EFEM: Equivariant Neural Field Expectation Maximization for 3D Object Segmentation Without Scene Supervision},
author={Lei, Jiahui and Deng, Congyue and Schmeckpeper, Karl and Guibas, Leonidas and Daniilidis, Kostas},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},