The State of the Art in Multiple Object Tracking Under Occlusion in Video Sequences Pierre F. Gabriel, Jacques G. Verly, Justus H. Piater, and André Genon {p.gabriel, jacques.verly, justus.piater, agenon}@ulg.ac.be Department of Electrical Engineering and Computer Science, University of Liège, Belgium. Abstract In this paper, we present a review of existing techniques and systems for tracking multiple occluding objects using one or more cameras. Following a formulation of the occlusion problem, we divide these techniques into two groups: merge- split (MS) approaches and straight-through (ST) approaches. Then, we consider tracking in ball game applications, with emphasis on soccer. Based on this assessment of the state of the art, we identify what appear to be the most promising approaches for tracking in general and for soccer in particular. 1. Introduction There has been considerable research activity on the tracking of objects from video sequences over the last 20 years. This interest is motivated by numerous applications, such as surveillance, video conferencing, man-machine interfaces, and sports enhancement. We consider the problem of simultaneously tracking one or more objects in one or more video sequences. In particular, we focus on the cases where two or more objects occlude each other, either partially or completely. Note that these objects can be rigid (e.g., cars) or deformable (e.g., persons). They can also be fixed (e.g., a column) or mobile, in which case they can be stationary or in motion. To evaluate and compare the capabilities of the various video tracking systems developped by others for diverse applications, we have found it useful to develop formal notions of objects, groups of objects ("blobs") and occlusions. These concepts are presented in Section 2. Based on this framework, we identify the primary signal processing components that any robust generic video tracking system should have. The first part of the discussion is centered on the use of a single camera. Then, we consider the added value of using multiple cameras. For each processing component, we list the main techniques that have been used by others. In each case, we make reference to the specific systems that use these techniques. This material is covered in Section 3 and 4. As a concrete illustration, we consider, in Section 5, the particular application of video tracking in soccer and review the specialized techniques that have been proposed by others in this domain. Finally, in Section 6, we identify the techniques that appear the most promising for dealing with occlusions in general and with soccer applications in particular. 2. Formulation of the occlusion problem Tracking objects in crowded scenes necessarily leads to the problem of occlusion. Discussing this problem and comparing the capabilities of various, existing video tracking systems would be much easier if we could cast the occlusion problem in some formal framework. To the best of our knowledge, no such framework exists in the literature. As a result, we propose here a simple, first-cut framework that will allow us, first, to describe the problem of occlusion in generic terms and, then, to classify and to compare the existing tracking systems that deal with occlusions. The primary entity is the "blob" (sometimes called target), which is defined as being a group of "objects". The exact nature of the objects is irrelevant; they can be persons, cars, columns, etc. Figure 1 shows the graphical representation of blobs and objects. It is important to note that a blob acts as a container that can have one or more objects. It is also important to understand that the things that are being detected, via image processing, and tracked, whether in the absence or in the presence of occlusions, are blobs, not objects. A blob could be recursively defined as being a group of blobs, instead of objects, but we do not feel this is necessary at this time.