Vehicle Detection and Tracking by Collaborative Fusion Between Laser Scanner and Camera Dominique Gruyer, Aur´ elien Cord and Rachid Belaroussi Abstract— This paper presents a new approach to fuse 3D and 2D information in a driver assistance setup, in particular to perform obstacle detection and tracking. We propose a new cooperative fusion method between two exteroceptive sensors: it is able to address highly non linear dynamic configuration without any assumption on the driving maneuver. Information are provided by a mono-layer laser scanner and a monocular camera which are unsynchronized. The initial detection stage is performed using the 1D laser data, which computes clusters of points which might correspond to vehicles present on the road. These clusters are projected to the image to define targets, which will be tracked using image registration techniques. This multi-object association and tracking scheme is imple- mented using belief theory integrating temporal and spatial information, which allows the estimation of the dynamic state of the tracks and to monitor appearance and disappearance of obstacles. Accuracy of the method is evaluated on a database made publicly available, focus is cast on the relative localization of the vehicle ahead: estimations of its longitudinal and lateral distances are analysed. I. I For many on-board automotive driver assistance systems DAS (such as collision avoidance, blind spot monitoring, adaptive cruise control, or parking assistant), robust and reliable vehicle detection is a critical step. On-road vehicle detection concerns systems where sensors are mounted on the vehicle rather than being fixed on the infrastructure such as cameras for trac monitoring systems [1]. The most common vehicle detection systems are using active sensors: laser, radar or sonar. Such sensors detect the distance of objects by measuring the travel time of a signal they emitted after its reflection by the object. Laser scanners are popular sensors for such a purpose [2], [3]: they are usually mounted on the front bumper and per- form a horizontal scanning; objects are detected on a given horizontal plane (mono-layer). The data coming from laser scanner are easier to cluster than radar and they are more accurate. Moreover, it is easier to quantify the reliability and to model the uncertainties of such data. However, laser sensors fail to overcome some situations such as non-planar road configuration, or a varying pitch angle due to the ego- vehicle maneuver depending on an acceleration or road shape variations (turns, road bumps . . . ). Radar are less subject to such issues, but their radio waves energy are reverberated by walls in a tunnel (wave guide eect); they can also be reflected by objects that can be safely overridden (metal plate, a guardrail or a Botts’ dot). Authors are with IFSTTAR, COSYS, LIVIC, 77 rue des chantiers, F- 78000, Versailles, France, e-mail: dominique.gruyer@ifsttar.fr Passive sensors such as cameras provide a refined and more complete view of the environment at a lower cost. Visual information is also interesting as recognition of dif- ferent kind of shapes can be performed on videos (lane de- tection, trac sign recognition, visual odometry, pedestrian detection), so an increasing number of DAS systems already include one or several on-board cameras. An extensive survey on visual-based approaches for on-road vehicle detection and tracking can be found in [4]. Detection methods are classified into three categories: knowledge-based [5] (edges, corners, colors, texture), stereo-based [6], [7] (disparity, inverse perspective mapping) and motion-based [8] (optical flow). Systems based solely on computer vision are not powerful enough to handle complex trac situations: multiple sensors, active and passive, are required. They can be used in a collaborative way as in [7]: a stereoscopic camera rig is used to validate the targets provided by a laser scanner; the outputs of the two filtered sensors are then merged by checking redundancy. In [9], a Lidar and a camera datas are processed providing a set of targets: the sum rule is used to combine the classifiers outputs. A more elaborated way of combining a laser rangefinder and a camera is proposed in [1] for a trac surveillance application (sensors are fixed on the infrastructure). The telemetric data are incorporated in the likelihood function of a particle filter tracking vehicles motion in the image. In track-to-track fusion systems [10], each local sensor data is filtered to provide a list of objects sent to a central fusion module that fuses all the local sensors objects lists into a single global objects list. Local sensor- level tracks are fused asynchronously using the information matrix fusion algorithm. In these works, the issue of data association (identifying which object of two sensors corre- spond to the same target) is not raised. In this paper, we present a new approach to eciently detect and track on-road vehicles using multiple sensors, namely a laser scanner and a camera: the focus is made on the issue of data association of simultaneous measure- ments from multiple sensors. In our approach, detection and tracking are addressed in a unified framework: targets coming from laser data processing are used in order to build and to manage tracks (tracking stage). This tracking step allows to improve target knowledge by use of temporal and spatial information. With a propagation module, a confidence index is computed for each track. This index quantifies the accumulation of temporal evidence about target existence. Another issue in the field of vehicle detection and tracking is the lack of representative benchmarks and evaluation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan 978-1-4673-6357-0/13/$31.00 ©2013 IEEE 5207