The AIT Face Tracker for VACE Multisite Meeting Recordings Andreas Stergiou, Ghassan Karame, Aristodemos Pnevmatikakis and Lazaros Polymenakos Athens Information Technology, Autonomic and Grid Computing, P.O. Box 64, Markopoulou Ave., 19002 Peania, Greece {aste, gkar, apne, lcp}@ait.edu.gr http://www.ait.edu.gr/research/RG1/overview.asp Abstract. This paper describes the AIT system for 2D face tracking and the results obtained in the VACE multisite meeting recordings section of the CLEAR 2007 evaluations. The system is based on the complementary operation of a set of face detectors and a tracker. To minimize false positives, the system utilizes a detection validation scheme based on color. 1 Introduction Tracking and recognizing people is very important for applications such as surveillance, security and human-machine interfaces. In the visual modality, faces are the most commonly used cue for recognition. Finding the faces also helps resolve human bodies that are merged into one by the tracker. Hence face localization is of paramount importance in many applications. Face localization can be done on a single camera frame by means of a detector, or across multiple frames using a tracker. Any of the two tasks can become very difficult in far-field unconstrained recording conditions. Low resolution faces suffering from pose, illumination and expression variations, as well as occlusions can only be detected sporadically. The misses need to be accounted for by a tracker, whose model needs frequent update to cope with the ever-changing face. Also, such face detectors suffer from false alarms that need to be constrained as much as possible. Face detection can be very accurate [1,2], given large resolution, almost frontal pose and long processing time allowance. Unfortunately, none of these apply to the intended application, where resolution is low, pose can be arbitrary and processing has to be real-time. Cascades of simple classifiers [3] can detect small faces in arbitrary background and are fast. An ensemble of such cascades can be trained, each with different poses, together serving as a multi-view face detector. Two approaches can be used in face tracking: stochastic and deterministic. Stochastic trackers are based on recursive Bayesian filtering, either in its exact form for Gaussian states and linear dynamics, the Kalman filter [4], or in its numerical approximation for non-linear dynamics, the particle filter [5]. Deterministic tracking on the other hand minimizes a cost function related to template matching between the