Vehicle Detection and Tracking by Collaborative Fusion Between Laser Scanner and Camera Dominique Gruyer, Aur´ elien Cord and Rachid Belaroussi Abstract— This paper presents a new approach to fuse 3D and 2D information in a driver assistance setup, in particular to perform obstacle detection and tracking. We propose a new cooperative fusion method between two exteroceptive sensors: it is able to address highly non linear dynamic conﬁguration without any assumption on the driving maneuver. Information are provided by a mono-layer laser scanner and a monocular camera which are unsynchronized. The initial detection stage is performed using the 1D laser data, which computes clusters of points which might correspond to vehicles present on the road. These clusters are projected to the image to deﬁne targets, which will be tracked using image registration techniques. This multi-object association and tracking scheme is imple- mented using belief theory integrating temporal and spatial information, which allows the estimation of the dynamic state of the tracks and to monitor appearance and disappearance of obstacles. Accuracy of the method is evaluated on a database made publicly available, focus is cast on the relative localization of the vehicle ahead: estimations of its longitudinal and lateral distances are analysed. I. I For many on-board automotive driver assistance systems DAS (such as collision avoidance, blind spot monitoring, adaptive cruise control, or parking assistant), robust and reliable vehicle detection is a critical step. On-road vehicle detection concerns systems where sensors are mounted on the vehicle rather than being ﬁxed on the infrastructure such as cameras for traﬃc monitoring systems [1]. The most common vehicle detection systems are using active sensors: laser, radar or sonar. Such sensors detect the distance of objects by measuring the travel time of a signal they emitted after its reﬂection by the object. Laser scanners are popular sensors for such a purpose [2], [3]: they are usually mounted on the front bumper and per- form a horizontal scanning; objects are detected on a given horizontal plane (mono-layer). The data coming from laser scanner are easier to cluster than radar and they are more accurate. Moreover, it is easier to quantify the reliability and to model the uncertainties of such data. However, laser sensors fail to overcome some situations such as non-planar road conﬁguration, or a varying pitch angle due to the ego- vehicle maneuver depending on an acceleration or road shape variations (turns, road bumps . . . ). Radar are less subject to such issues, but their radio waves energy are reverberated by walls in a tunnel (wave guide eﬀect); they can also be reﬂected by objects that can be safely overridden (metal plate, a guardrail or a Botts’ dot). Authors are with IFSTTAR, COSYS, LIVIC, 77 rue des chantiers, F- 78000, Versailles, France, e-mail: dominique.gruyer@ifsttar.fr Passive sensors such as cameras provide a reﬁned and more complete view of the environment at a lower cost. Visual information is also interesting as recognition of dif- ferent kind of shapes can be performed on videos (lane de- tection, traﬃc sign recognition, visual odometry, pedestrian detection), so an increasing number of DAS systems already include one or several on-board cameras. An extensive survey on visual-based approaches for on-road vehicle detection and tracking can be found in [4]. Detection methods are classiﬁed into three categories: knowledge-based [5] (edges, corners, colors, texture), stereo-based [6], [7] (disparity, inverse perspective mapping) and motion-based [8] (optical ﬂow). Systems based solely on computer vision are not powerful enough to handle complex traﬃc situations: multiple sensors, active and passive, are required. They can be used in a collaborative way as in [7]: a stereoscopic camera rig is used to validate the targets provided by a laser scanner; the outputs of the two ﬁltered sensors are then merged by checking redundancy. In [9], a Lidar and a camera datas are processed providing a set of targets: the sum rule is used to combine the classiﬁers outputs. A more elaborated way of combining a laser rangeﬁnder and a camera is proposed in [1] for a traﬃc surveillance application (sensors are ﬁxed on the infrastructure). The telemetric data are incorporated in the likelihood function of a particle ﬁlter tracking vehicles motion in the image. In track-to-track fusion systems [10], each local sensor data is ﬁltered to provide a list of objects sent to a central fusion module that fuses all the local sensors objects lists into a single global objects list. Local sensor- level tracks are fused asynchronously using the information matrix fusion algorithm. In these works, the issue of data association (identifying which object of two sensors corre- spond to the same target) is not raised. In this paper, we present a new approach to eﬃciently detect and track on-road vehicles using multiple sensors, namely a laser scanner and a camera: the focus is made on the issue of data association of simultaneous measure- ments from multiple sensors. In our approach, detection and tracking are addressed in a uniﬁed framework: targets coming from laser data processing are used in order to build and to manage tracks (tracking stage). This tracking step allows to improve target knowledge by use of temporal and spatial information. With a propagation module, a conﬁdence index is computed for each track. This index quantiﬁes the accumulation of temporal evidence about target existence. Another issue in the ﬁeld of vehicle detection and tracking is the lack of representative benchmarks and evaluation 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) November 3-7, 2013. Tokyo, Japan 978-1-4673-6357-0/13/$31.00 ©2013 IEEE 5207