Object-Level SLAM for Autonomous Navigation
Ellipsoid landmarks are useful
While dense geometric maps such as occupancy or voxel representations provide detailed metric understanding of the world, object-level maps can provide compact information with fewer parameters. Ellipsoid representations are a compelling representational choice of an approximate geometric model (similar to 3D cuboids) because they have an elegant mathematical formulation that provides a closed-form relationship between 2D bounding box detections and 3D object size and pose. Using this relationship, we can build coarse representations of the world, such as in this simple simulation:
Figure: Ground-truth ellipsoids and cameras shown in black, while estimated camera positions and ellipsoids are shown in red. A simulated camera with bounding box detections in green is shown in the top right.
Ellipsoid representations are difficult to constrain under low-viewpoint diversity
However, good performance of naive implementations rely on diverse viewpoints such as the orbiting path shown above to ensure good performance. When vehicles are navigating efficiently, straight-line, low baseline maneuvers are more common. Such camera motions are challenging in general for monocular SLAM, and under such conditions the ellipsoid estimates degrade.
Figure: Ellipsoid estimation quality is degraded under straight-line camera trajectories.
We introduced new geometric and semantic information
We proposed to add two types of additional information into the SLAM graph: an additional plane measurement (shown in the below simulation in blue), and a semantic shape prior based on the detected class of object. These measurements require no additional raw measurements, and can dramatically improve the estimation accuracy under camera trajectories more typical of autonomous navigation.
Figure: ROSHAN’s proposed additional measurements can improve object-level SLAM with ellipsoid landmarks under challenging camera motions
To read more about this project, including quantitative results, results on photo-realistic simulation data, and results on real-world data, see our paper:
Related Threads:
Here are a few related projects that I led or collaborated on.
Ellipsoid Object Representations for Deep Learning. It is not surprising that the various mathematical forms of ellipsoid representations are useful outside of the context of mapping. In other work, we’ve used the ellipsoid representation to lower the annotation latency of deeply learned 3D object estimation. [project overview]
Using Object-Level Representations for Planning. How can object-level maps improve autonomous navigation in novel environments beyond simple collision avoidance? We investigated how to combine object-level maps with geometric maps to improve navigation outcomes. [project overview]
Hierarchical Mapping Representations. Ellipsoid representations have 9 free parameters, which can be poorly constrained even when additional constraining information has been added. Follow up work in this space (led by my co-author Kyel) introduced the notion of an evolving hierarchy of abstractions, where representations change from patches to points to sphere to ellipsoids as further information is acquired. [pdf]
Object-Level Data Association. A practical challenge of any SLAM problem is to perform data association, i.e., determining if new object detections are of an existing object, or a new object. The data association step is often assumed to be solved in a separate process, but poor data association can have detrimental effects on the estimation process. An undergraduate researcher who worked with me investigated methods for unsupervised learning of a descriptor space, with some neat insights. [pdf]