EXTRACTING STRUCTURAL FRAGMENTS FROM IMAGES SHOWING OVERLAPPING PEDESTRIANS László Havasi*, Csaba Benedek**, Zoltán Szlávik** and Tamás Szirányi** *Péter Pázmány Catholic University, H-1052 Budapest Piarista köz 1., Hungary, e-mail: havasi@digitus.itk.ppke.hu **Analogic and Neural Computing Laboratory, Hungarian Academy of Sciences, H-1111 Budapest, Kende u. 13-17, Hungary, e-mail: {szlavik, benedek, sziranyi}@sztaki.hu ABSTRACT This paper outlines and demonstrates a new algorithm which is capable of extracting characteristic fragments of the body outline of human figures from video image- sequences, even in the non-ideal case of typical outdoor illumination conditions and camera positions. Our method can derive relevant information regarding the significant body elements from video sequences showing walking people, without the necessity for imposing severe or unusual constraints with regard to the input images. The proposed algorithm connects featured parts in the image into symmetrical objects, tracks them, and generates derived spatio-temporal statistical features which are used to ensure stable tracking results. Our method is fast enough for use in real-time. Using the grouped dual-point approach outlined here, we can extract biometric information suitable for subsequent analysis of the walking-gait characteristics, even in the case of overlapping and transient image outlines. KEY WORDS Motion analysis, symmetry tracking, structure of fragments, gait detection, image morphology 1. Introduction Automatic detection of humans, and body-part localisation, are important but challenging problems in computer vision. Human motion analysis and tracking has long been proposed for applications in surveillance [1]. The primary step in analysis and tracking of human motion consists of the modelling of moving people represented in image sequences. Several approaches have been proposed for such modelling: e.g. elliptical cylinders [2], configuration of parameterised primitives [3], or 3-D tracking [4]. However, these methods are too complicated for effective detection of human figures in practical conditions [5]. Other common methods are the shape decomposition method [6], and the skeleton-based representation [7], which has been used to model the topological structure of the body. The contour-based representations can be extended with the use of deformable templates to handle shape deformations [8]. However, a drawback of such shape methods is that the model and the extracted image contour must first be aligned, which is not a trivial task. In addition, these methods cannot model individual parts of the body, so they can handle only a limited variety of shapes. In [9] a blob-based representation is introduced, which is useful in colour images. That method can successfully separate different people in the same image, provided that they are wearing distinct clothes; but it cannot extract detailed information about the various parts of the body. The foundation of motion analysis is motion tracking. This task is very important because increased precision in tracking brings considerable improvements in recognition accuracy. This improved precision can be achieved by using the above methods in conjunction with spatio- temporal analysis [10]. Kalman filtering [11] is a widely used stochastic modeling method employed to handle occlusion and articulated motion. Problems commonly arise in situations where the partitions of motions and of people in the input images are not trivial. Nevertheless, in our examples we are able to successfully analyse real images similar to those obtained in practice from city-wide surveillance systems. We used high-resolution (720x576 pixel), wide-angle cameras to observe human figures in busy outdoor locations; and in these quite realistic circumstances the resolution and contrast of the body outline of persons in the image is often rather poor. We also note that most publications focus on cases where only one, or at most a few, people are moving in the scene being analysed [12][17]. Our present method on the other hand is quite successful for multi-person images. Our ultimate goal is to track moving people in complex scenes, with the help of biometric information derived from the images. The dynamic properties of walking uniquely characterise a moving human figure [13]. In future work we plan to use these attributes for analysis of images obtained in a multi-camera environment [19]. Thus our main thrust in the present paper is not to detect