SHAPE-BASED DETECTION OF HUMANS FOR VIDEO SURVEILLANCE APPLICATIONS Herbert Ramoser 1 , Thomas Schl¨ ogl 1 , Csaba Beleznai 1 , Martin Winter 2 , and Horst Bischof 3 1 Advanced Computer Vision GmbH, Vienna, Austria {herbert.ramoser, thomas.schloegl, csaba.beleznai}@acv.ac.at 2 Siemens AG ¨ Osterreich, Programm- und Systementwicklung, Graz, Austria 3 Institute for Computer Graphics and Vision, Univ. of Technology, Graz, Austria ABSTRACT In this paper we describe a surveillance system that is not only able to detect blobs and track them but also determines if a blob is a person. The given blob is segmented into sub-regions. A per- son model is fit to these regions such that a likelihood measure is maximized. The likelihood measure depends on the number of identified body parts, their length, location, and aspect ratio. The method is translation, rotation, and scaling invariant and computa- tionally efficient. The results obtained for test video sequences are very encouraging. 1. INTRODUCTION The detection of the presence of humans is of importance for ex- ample in save robot navigation and automatic video surveillance. In this paper we focus on video surveillance applications with sta- tionary cameras where human detection is generally used as an assistive technology for the system operator. A computer-based surveillance system must be able to reliably detect a possible in- truder and alert the system operator. The method should be robust with respect to the wide range of appearances of persons (e. g., due to clothing), illumination conditions, and background scenes. The detection algorithm is integrated into a surveillance sys- tem which is currently under development [1]. The system detects foreground pixels using a model of the background scene. The foreground pixels are grouped into blobs which are used for fur- ther analysis. All non-foreground pixels are used to update the background model. In a second stage a basic calibration of the camera allows that every blob is classified according to its size as a (potential) single person and other objects (e. g., group of per- sons, cars, etc.). All single person sized blobs are subjected to further analysis by the algorithm presented in this paper. The remainder of the paper is structured as follows: first we give a brief overview of related literature, Section 2 describes the shape segmentation and model fitting algorithm in detail. Sec- tion 3 presents some encouraging results on test video sequences. Finally, Section 4 concludes with a summary of advantages and limitations and suggestions for further improvements. 1.1. Related Work The detection of humans in images or video sequence has attracted growing attention by several research groups. Three distinct detec- This work has been carried out within the K plus Competence Center ADVANCED COMPUTER VISION. This work was funded from the K plus Program. tion methods can be distinguished: shape based, color and texture based, and motion based. Shape based analysis is the most widely used approach. Shape based methods include simple blob area measures [2], projection histograms [3, 4], clustering of statistical shape descriptors [5], Fourier and wavelet silhouette descriptors [6, 7], and fitting of a human model to the blob [8, 9, 10]. Few methods use color and texture features in order to detect humans [11, 12]. All methods mentioned so far operate on static images. An alternative is to use the unique motion pattern of a walking human for classification [13, 14]. Most of the published methods have severe problems in de- tecting clothed or partially occluded humans. The most promising exception is described in [10]: a human model is fit to the image blob such that a likelihood measure is maximized. Varying cloth- ing and occlusions are allowed by the dynamic model assembly. In this paper we present an improved version of this algorithm which increases the processing speed by detection of the body parts in a single iteration and modified calculation of the model likelihood. 2. METHODS 2.1. System Architecture The human detection algorithm is integrated into a surveillance system which is currently under development [1]. The analysis performed by the system is outlined in Fig. 1. The basic steps are: Frame acquisition: Frames are captured at regular time in- tervals which are typically in the range of 6 to 12 frames per second.                              Fig. 1. Principal steps of the motion analyis and human detection process.