978-1-7281-4914-1/19/$31.00 ©2019 IEEE
Multi-Object tracking of 3D cuboids using aggregated
features
Mircea Paul Muresan, Sergiu Nedevschi
Computer Science Department
Technical University of Cluj-NapocaCluj-Napoca, Romania
{mircea.muresan, sergiu.nedevschi}@cs.utcluj.ro
Abstract— the unknown correspondences of measurements and
targets, referred to as data association, is one of the main
challenges of multi-target tracking. Each new measurement
received could be the continuation of some previously detected
target, the first detection of a new target or a false alarm. Tracking
3D cuboids, is particularly difficult due to the high amount of data,
which can include erroneous or noisy information coming from
sensors, that can lead to false measurements, detections from an
unknown number of objects which may not be consistent over
frames or varying object properties like dimension and
orientation. In the self-driving car context, the target tracking
module holds an important role due to the fact that the ego vehicle
has to make predictions regarding the position and velocity of the
surrounding objects in the next time epoch, plan for actions and
make the correct decisions. To tackle the above mentioned
problems and other issues coming from the self-driving car
processing pipeline we propose three original contributions: 1)
designing a novel affinity measurement function to associate
measurements and targets using multiple types of features coming
from LIDAR and camera, 2) a context aware descriptor for 3D
objects that improves the data association process, 3) a framework
that includes a module for tracking dimensions and orientation of
objects. The implemented solution runs in real time and
experiments that were performed on real world urban scenarios
prove that the presented method is effective and robust even in a
highly dynamic environment.
Keywords—multi-target tracking; data association; MDP;
smoothing trajectories, feature engineering;
I. INTRODUCTION
Efficient and reliable perception is one of the core
functions for representing the dynamic environment by
autonomous vehicles. The ability to effectively detect the
surrounding traffic scenarios plays an important role for many
of the self-driving car components such as collision avoidance,
path planning or localization. In order to navigate successfully
several complex situations have to be addressed. The most
difficult being the crowded places where multiple static and
dynamic objects are present which may exhibit various motion
behaviors. For the problem of environment perception, the
target tracking process is essential since provided
measurements are useful only if they are filtered (not noisy) and
identifiable in occluded situations such that higher level
modules from the processing pipeline can transform each
measurement in an actionable information.
To address such complex scenarios which may occur
in various weather conditions multiple types of complementary
sensors are usually employed. Chiefly among them the LIDAR
(Light Detection and Ranging) sensor is used because of its
ability to provide an accurate position [1, 2]. Other sensors like
stereo cameras can also be used because of their ability to
provide the semantic class additionally to the position estimate
of objects [3]. The main issue that appears with stereo sensors
is that they may not work well in case of bad illumination,
perspective effect or lack of texture among others [4]. Radars
are another category of range sensors which are used in
autonomous vehicles because of their long range detection
ability and capacity to accurately detect motion. The drawback
of radars is that they have a reduced field of view and are not
able to reliably detect static objects or objects made of porous
plastic [5]. Modern perception and tracking architectures
usually fuse all the available sensor data to obtain a more
comprehensive understanding of the environment. However
one key aspect of any modern architecture is adaptability in case
of sensor failure. In such a case the remaining sensors should
be able to accurately detect and track the road objects. In this
paper we address the problem of target tracking using a LIDAR
sensor.
We split the challenges of developing a robust tracking
algorithm in three categories: high level processing pipeline
related challenges, target tracking related issues and time
constraints.
The high-level processing pipeline challenges refers to
the errors introduced in the target tracking module by the output
provided by other modules from the self-driving car processing
pipeline, inefficient sensor calibration or bad sensor
synchronization. The general pipeline of the detection and
tracking procedure includes steps like point cloud
segmentation, candidate matching and motion estimation [6].
The quality of the point cloud segmentation algorithm impacts
the quality of the tracking results. Existing methods in the
literature work either on 2D [7] grid maps or 3D occupancy grid
maps with higher computational burden [8]. Incorrect
segmentation leads to a difficult candidate matching and
tracking in a cluttered scenario. Some of the common issues of
objects obtained by incorrect point cloud segmentation are
change in appearance, unreliable dimensions and fluctuating
positions in consecutive frames.
On the other hand, poor synchronization of LIDAR
and camera may lead to bad point cloud projection in the image.
Which may result in 3D points with an erroneous semantic
class. In figure 1 we can see such a scenario. In the left-hand
side of the image, we can observe the semantic class of each
projected 3D LIDAR point in the semantic image. As we can
11