Combining Model- and Template-based Vehicle Tracking for Autonomous Convoy Driving Carsten Fries, Thorsten Luettel and Hans-Joachim Wuensche Abstract— This paper presents a robust method for vehicle tracking with a monocular camera. A previously published model-based tracking method uses a particle filter which needs an initial vehicle hypothesis both at system start and in case of a tracking loss. We present a template-based solution using different features to estimate a 3D vehicle pose roughly but fast. Combining model- and template-based object tracking keeps the advantages of each algorithm: Precise estimation of the 3D vehicle pose and velocity combined with a fast (re-) initialization approach. The improved tracking system was evaluated while driving autonomously in urban and unstructured environments. The results show that poorly visible vehicles can be tracked during different weather conditions in real-time. I. I NTRODUCTION In recent years, research on autonomous driving has in- tensified constantly. Many fields of application exist where advanced driver assistance systems have to detect other vehicles. Some common ones are the collision avoidance system, adaptive light control or adaptive cruise control. E.g. an adaptive cruise control situation is driving on the highway while an autonomous system keeps the distance to the car in front. One or more sensors like RADAR, LIDAR or camera sensors are used to obtain information about the surrounding environment [1]–[3]. The measurement range goes from the 1D distance up to the complete 3D position, orientation and velocity. Highways have smooth trajectories with low curvature changes, which is a big advantage for systems that measure the vehicle distance. In dense city traffic or unstructured environments, e.g. small forest tracks, vehicle detection must be more accurate. This paper focuses on a low-cost solution for following a leading vehicle with just one monocular vision sensor (see fig. 1). It is based on the work about model-based vehicle tracking by Manz et al. [4]. The outline of the paper is as follows: Section I introduces fields of application and related work on vehicle tracking methods. Section II describes our algorithm in three subsec- tions. Experimental results collected while driving in urban and non-urban environments are presented in section III. Finally, conclusions are given in section IV. A. Related Work The related work in this paper has a focus on non- stationary passive vision systems and their algorithms. I.e. the popular but expensive LIDAR and RADAR sensors will All authors are with department of Aerospace Engineering, Autonomous Systems Technology (TAS), University of the Bundeswehr Munich, Neubiberg, Germany. Contact author email: carsten.fries@unibw.de Fig. 1: Vehicle tracking for autonomous convoy driving. not be considered. Passive vision sensors are frequently used because they are low-priced and have low power consump- tion. Most publications work with a single camera that is mounted behind the windshield of the vehicle [5]–[10]. They track multiple vehicles without precise information about each vehicle. The algorithms are primarily designed for motorways with clearly visible road markings, a uniform surface and small change in road curvature. It is common to train a classifier offline, e.g. [5] train a classifier with the fast computable Haar-like features [6]. They detect vehicle rear sides from highway participants that have the same traveling direction. In contrast, [7] train SVM classifiers to detect all sides of a vehicle. They use features called Histogram of Orientated Gradients (HOG) and were able to detect any vehicle side on intersections. The classifier also responds with a positive detection if a vehicle is only partially visible. In another publication they basically use the same monocular procedure of vehicle detection, but with stereo position estimation [8]. The vehicle is detected in each camera image and the vehicle location is extracted from the depth map which is created by the stereo system. Other publications focus on stereo-based techniques where objects are segmented, detected and tracked in 3D point clouds [9], [10]. Some authors analyze the optical flow, e.g. to detect vehicles which have an opposite driving direction. If the ego motion is known, vehicles with the same driving direction are detectable as well [5]. Furthermore filters are often used, e.g. Kalman filtering is applied to estimate state values over time [8]. Within the last few years the multidimensional particle filter became very popular. The particle filter can verify up to thousands of hypotheses for the object of interest. Notably, only few published approaches with manually generated 3D vehicle models had robust tracking results [4], [11].