Multi Person Tracking Within Crowded Scenes Andrew Gilbert and Richard Bowden University of Surrey, Guildford, Surrey, GU2 7XH, UK {a.gilbert,r.bowden}@Surrey.ac.uk Abstract. This paper presents a solution to the problem of tracking people within crowded scenes. The aim is to maintain individual object identity through a crowded scene which contains complex interactions and heavy occlusions of people. Our approach uses the strengths of two separate methods; a global object detector and a localised frame by frame tracker. A temporal relationship model of torso detections built during low activity period, is used to further disambiguate during periods of high activity. A single camera with no calibration and no environmental information is used. Results are compared to a standard tracking method and groundtruth. Two video sequences containing interactions, overlaps and occlusions between people are used to demonstrate our approach. The results show that our technique performs better that a standard tracking method and can cope with challenging occlusions and crowd interactions. 1 Introduction Visual surveillance systems are commonly placed in large areas of high traf- ﬁc such as in airports, rail stations and shopping centres. Tracking individuals within crowds remains a diﬃcult problem due to the complex interactions and occlusions that occur. This paper presents an approach to tracking individuals and retaining object identity through occlusions and object interactions, within a single camera. Most existing methods of tracking individuals, in the area of visual surveillance, involves the segmentation of foreground objects from a mod- elled background. These methods often fail when tracking an individual in a crowded scene since individuals cannot be easily segmented in isolation from the background. There are two categories of techniques to aid this problem; frame- by-frame trackers are highly accurate for scenes with little or no occlusion. While object detectors work well at recognizing speciﬁc objects in individual frames. Therefore the method proposed within this paper is designed to use both and take advantage of the strengths of each. There have been many possible solutions presented with regard to the problem of tracking multiple objects, which can categorised as either single or multiple camera approaches. 1.1 Multiple Camera Tracking The use of multiple wide baseline cameras allows simpler occlusion reasoning and can allow for a 3D environment to be built of the scene through camera A. Elgammal et al. (Eds.): Human Motion 2007, LNCS 4814, pp. 166–179, 2007. c  Springer-Verlag Berlin Heidelberg 2007