978-1-7281-4914-1/19/$31.00 ©2019 IEEE Multi-Object tracking of 3D cuboids using aggregated features Mircea Paul Muresan, Sergiu Nedevschi Computer Science Department Technical University of Cluj-NapocaCluj-Napoca, Romania {mircea.muresan, sergiu.nedevschi}@cs.utcluj.ro Abstract— the unknown correspondences of measurements and targets, referred to as data association, is one of the main challenges of multi-target tracking. Each new measurement received could be the continuation of some previously detected target, the first detection of a new target or a false alarm. Tracking 3D cuboids, is particularly difficult due to the high amount of data, which can include erroneous or noisy information coming from sensors, that can lead to false measurements, detections from an unknown number of objects which may not be consistent over frames or varying object properties like dimension and orientation. In the self-driving car context, the target tracking module holds an important role due to the fact that the ego vehicle has to make predictions regarding the position and velocity of the surrounding objects in the next time epoch, plan for actions and make the correct decisions. To tackle the above mentioned problems and other issues coming from the self-driving car processing pipeline we propose three original contributions: 1) designing a novel affinity measurement function to associate measurements and targets using multiple types of features coming from LIDAR and camera, 2) a context aware descriptor for 3D objects that improves the data association process, 3) a framework that includes a module for tracking dimensions and orientation of objects. The implemented solution runs in real time and experiments that were performed on real world urban scenarios prove that the presented method is effective and robust even in a highly dynamic environment. Keywords—multi-target tracking; data association; MDP; smoothing trajectories, feature engineering; I. INTRODUCTION Efficient and reliable perception is one of the core functions for representing the dynamic environment by autonomous vehicles. The ability to effectively detect the surrounding traffic scenarios plays an important role for many of the self-driving car components such as collision avoidance, path planning or localization. In order to navigate successfully several complex situations have to be addressed. The most difficult being the crowded places where multiple static and dynamic objects are present which may exhibit various motion behaviors. For the problem of environment perception, the target tracking process is essential since provided measurements are useful only if they are filtered (not noisy) and identifiable in occluded situations such that higher level modules from the processing pipeline can transform each measurement in an actionable information. To address such complex scenarios which may occur in various weather conditions multiple types of complementary sensors are usually employed. Chiefly among them the LIDAR (Light Detection and Ranging) sensor is used because of its ability to provide an accurate position [1, 2]. Other sensors like stereo cameras can also be used because of their ability to provide the semantic class additionally to the position estimate of objects [3]. The main issue that appears with stereo sensors is that they may not work well in case of bad illumination, perspective effect or lack of texture among others [4]. Radars are another category of range sensors which are used in autonomous vehicles because of their long range detection ability and capacity to accurately detect motion. The drawback of radars is that they have a reduced field of view and are not able to reliably detect static objects or objects made of porous plastic [5]. Modern perception and tracking architectures usually fuse all the available sensor data to obtain a more comprehensive understanding of the environment. However one key aspect of any modern architecture is adaptability in case of sensor failure. In such a case the remaining sensors should be able to accurately detect and track the road objects. In this paper we address the problem of target tracking using a LIDAR sensor. We split the challenges of developing a robust tracking algorithm in three categories: high level processing pipeline related challenges, target tracking related issues and time constraints. The high-level processing pipeline challenges refers to the errors introduced in the target tracking module by the output provided by other modules from the self-driving car processing pipeline, inefficient sensor calibration or bad sensor synchronization. The general pipeline of the detection and tracking procedure includes steps like point cloud segmentation, candidate matching and motion estimation [6]. The quality of the point cloud segmentation algorithm impacts the quality of the tracking results. Existing methods in the literature work either on 2D [7] grid maps or 3D occupancy grid maps with higher computational burden [8]. Incorrect segmentation leads to a difficult candidate matching and tracking in a cluttered scenario. Some of the common issues of objects obtained by incorrect point cloud segmentation are change in appearance, unreliable dimensions and fluctuating positions in consecutive frames. On the other hand, poor synchronization of LIDAR and camera may lead to bad point cloud projection in the image. Which may result in 3D points with an erroneous semantic class. In figure 1 we can see such a scenario. In the left-hand side of the image, we can observe the semantic class of each projected 3D LIDAR point in the semantic image. As we can 11