Weakly Supervised Deep Object Estimation

Annotations for 3D object estimation are expensive

Deep learning is an attractive paradigm for monocular object estimation due to the potential for low-latency. However, 3D annotations for fully supervised learning can be difficult to obtain, as it may be necessary to interface with 3D data such as pointclouds. Another approach is to use a prori detailed models, such as a mesh, but these again are often non-trivial to obtain.

We propose to approximate objects as ellipsoids, enabling training from 2D annotations only

In this work, we propose an approach called VoluMon (for volumetric monocular estimation) that approximates objects as ellipsoids and weakly supervises learning using 2D image-space annotations. In contrast to 3D information, 2D annotations such as bounding boxes or object segmentations are generally easier to obtain.

Figure: Example of image-space annotations that can be used to train VoluMon. These annotations are acquired using a pre-trained neural network, further lowering the annotation burden for deep 3D object estimation.

VoluMon lowers the annotation barrier for monocular object estimation

At inference time, the input to the VoluMon network is a single RGB image and a 2D bounding box around the object of interest and the output is an ellipsoid estimate, seen here overlaid on the pointcloud as a blue mesh for visualization purposes. Although we do not have ground-truth annotations for the objects, both the size and translation estimates are generally qualitatively reasonable.

Figure: A time series of object estimates for an orange.

Figure: A single estimated ellipsoid for a mug object.

For a more detailed overview (as well as quantitative experiments), check out the manuscript, or my IROS 2021 talk.

Liu, K., Ok, K., & Roy, N. VoluMon: Weakly-Supervised Volumetric Monocular Estimation with Ellipsoid Representations. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5686-5693). IEEE.

Related Threads:

Here are a few related projects that I led or collaborated on.

Object Level SLAM with Ellipsoid Representations for Autonomous Navigation. Another approach to building object level maps is to fuse multiple observations collected over the vehicle’s trajectory. We proposed several improvements to object-level SLAM with ellipsoid representations to make a more suitable algorithm for autonomous navigation. [project page]
Using Object-Level Representations for Planning. How can object-level maps improve autonomous navigation in novel environments beyond simple collision avoidance? We investigated how to combine object-level maps with geometric maps to improve navigation outcomes. [project page]

Weakly Supervised Deep Object Estimation

Annotations for 3D object estimation are expensive

We propose to approximate objects as ellipsoids, enabling training from 2D annotations only

VoluMon lowers the annotation barrier for monocular object estimation

Related Threads:

Object-Level SLAM for Autonomous Navigation