3D localization


3D localization of objects using Pointclouds

The Challenge

Vision is an essential component of any self-navigation system, such as an autonomous car. The more an autonomous system is able to understand its surroundings, the better it would be able to navigate its movements. 3D Object detection and localization play an important role in environmental understanding, as it will not only identify surrounding objects but also locate them with respect to them. A self-driving car continuously collects data from multiple sensors. In this case study, we used data collected from the Lidar sensor and 6 Camera Sensors to detect and localize surrounding cars using Lyft self-driving data published in 2019.



Data Preparation: We Voxelized the Lidar Point Cloud and used the Bird’s eye view and cylindrically projected images for model input.


Base Model: We used the Aggregate View Object Detector (AVOD), a two-staged detection network with slight modifications. The first stage of the model finds all the regions that are likely to have the object, and the second stage of the model refines those findings for more accurate predictions.​


Model Training: Performed feature extraction with convolution on processed images and lidar data and fused the feature map crops based on anchor boxes for both the data. Performed objectness classification for each crop and used regression to find offsets and orientation of anchor box crops.​


Experimentation: We experimented with different ways for cylindrical projections and mapping of 3D points to images and with a different set of anchor boxes for model training.


Infrastructure and Hardware: We preformed data engineering and model training on the local workstation with 20 CPU Cores, 128gb RAM, and TITAN RTX GPU.


average precision

for detecting cars in 50m radius
(0.5 3D IOU)

average precision

for detecting cars in 25m radius
(0.5 3D IOU)


In the realm of self-driving cars, access to a huge amount of high-quality data in various categories is essential to have a reliable model in estimating the surroundings for safer navigation with a reduced number of accidents. Accurate detection of thousands of classes of objects on the streets such as road signs, road edges, lanes, traffic signals, cars, trucks, pedestrians, and their respective travel paths to predict and decide the next action in real-time is the fundamental task for autonomous operations.

In this initial effort, we combined lidar data with the cylindrically stitched visual frames from the 6 cameras to identify vehicles around the car in 360 degrees. This process can be extended to use only cameras for 3-D object detection, tracking, and monitoring with other robotic systems.