ROBUST VEHICLE TRACKING IN VIDEO IMAGES BEING TAKEN FROM A HELICOPTER Fatemeh Karimi Nejadasl, Ben G.H. Gorte, and Serge P. Hoogendoorn Institute of Earth Observation and Space System, Delft University of Technology, Kluyverweg 1, 2629 HS, Delft, The Netherlands f.KarimiNejadasl, b.g.h.gorte@tudelft.nl Transport and planning section, Delft University of Technology, Stevinweg 1, 2628 CN, Delft, The Netherlands S.P.Hoogendoorn@tudelft.nl Commission VII KEY WORDS: Optical Flow, Tracking, Feature detection, Matching, Region based, Feature based ABSTRACT: Measuring positions, velocities and accelerations/decelerations of individual vehicles in congested trafﬁc with standard trafﬁc moni- toring equipment, such as inductive loops, are not feasible. The behavior of drivers in the different trafﬁc situations, as re-quired for microscopic trafﬁc ﬂow models, is still not sufﬁciently known. Remote sensing and computer vision technology are recently being used to solve this problem. In this study we use video images taken from a helicopter above a ﬁxed point of the highway. We address the problem of tracking the movement of previously detected vehicles through a stabilized video sequence. We combine two approaches, optical ﬂow and matching based tracking, improve them by adding constraints and using scale space. Feature elements, i.e. the corners, lines, regions and outlines of each car, are extracted ﬁrst. Then, optical-ﬂow is used to ﬁnd for each pixel in the interior of a car the corresponding pixel in the next image, by inserting the brightness model. Normalized cross correlation matching is used at the corners of the car. Different pixels are used for solving the aperture problem of optical ﬂow and for the template matching area: neighboring pixel and feature pixels. The image boundary, road line boundaries, maximum speed of the car, and positions of surrounding cars are used as constraints. Ideally, the result of each pixel of a car should give the same displacement because cars are rigid objects. 1. INTRODUCTION Trafﬁc congestion is an important problem in modern society. A lot of money and time is wasted in trafﬁc jams. Car crashes and accidents are more frequent during busy trafﬁc conditions. Sev- eral efforts are made to tackle this problem: better facilities and regulations should improve the situation on existing roads while the number of the roads is extended as well. Trafﬁc congestion is highly dependent on the behavior of indi- vidual drivers. For example, reaction times and lane-changing techniques vary from driver to driver. Therefore it is useful to model the behavior of individual drivers, as well as the interac- tion between drivers, before new decisions and regulations for trafﬁc congestion control are initiated. Current trafﬁc theories are not yet able to correctly model the behavior of drivers during con- gested or nearly congested trafﬁc ﬂow, taking individual driver’s behavior into account. For this so-called microscopic trafﬁc mod- els are needed. Vast amounts of data are required to set up those models and determine their parameters. Trafﬁc parameter extraction with airborne video data is recently getting popular. Automatic extraction of trafﬁc parameters is a computer vision task. For trafﬁc parameter extraction, informa- tion about each vehicle is needed during the period of time the vehicle is present in the scene. A possible solution is to detect a vehicle in a video frame when it enters the scene and then track it in successive frames. The video is recorded by a camera mounted on a helicopter. Since we want to model the behavior of as many vehicles (drivers) as possible, we attempt to cover a large highway section, leading to the lowest spatial resolution that accuracy requirements allow. Typically we use a spatial resolution (pixel size) between 25 and 50 cm. Helicopter movement invokes camera motion in addition to ob- ject (i.e. vehicle) motion. We have removed camera motion with the method describes in (Hoogendoorn et al. 2003) and (Hoogen- doorn et al. 2003). Unwanted areas outside the road boundary are eliminated by (Gorte et al. 2005). In earlier work, vehicles were detected by a difference method (Hoogendoorn et al. 2003), which requires involvement of an operator when automatic detection fails. This is often the case with cars having low contrast against the background (dark cars on a dark road surface). We used cross correlation matching for tracking. This works well in the case of distinct features with homogeneous movements. In this case it is less sensitive to the illumination change. However it is too sensitive to similarities in texture or brightness. To improve the performance of tracking, we investigate the use of optical ﬂow methods in this paper. Improvement with respect to least square matching (Atkinson 1996) is expected because of the additional time element in the optical ﬂow equation. Optical ﬂow method is sensitive to small (even sub-pixel) move- ments. This sensitivity may be helpful for tracking cars that are similar to the background. The paper is organized as follows. In section 2. we present re- lated work. Section 3. discusses zero cross correlation match- ing method, in section 4. gradient based optical ﬂow method by assumption of constraint and linear model of brightness is dis- cussed. Feature selection and constraints are described in the re- sult redundancy exploitation section. We give results in section 6. and conclusions in section 7.. 2. RELATED WORK Automatic object tracking receives attention in computer vision for a very diverse range of applications. Matching methods are largely used in video tracking. As men- tioned earlier, they are quit good in distinctive objects. However ISPRS Commission VII Mid-term Symposium "Remote Sensing: From Pixels to Processes", Enschede, the Netherlands, 8-11 May 2006 528