A Stereo and Color-based Method for Face Pose Estimation and Facial Feature Extraction Robert Niese, Ayoub Al-Hamadi, Bernd Michaelis Institute for Electronics, Signal Processing and Communications (IESK) Otto-von-Guericke-University Magdeburg robert.niese@et.uni-magdeburg.de Abstract This paper describes a method to perform face pose estimation and high resolution facial feature extraction on the basis of stereoscopic color images. Unlike other approaches no light projection is required at running time. In our method face detection is based on color driven clustering of 3D points derived from stereo. A mesh model is registered with the post-processed face cluster using a variant of the Iterative Closest Point algorithm (ICP). Pose is derived from correspondence. Then, pose and model information is used for face normalization and facial feature localization. Results show, stereo and color are powerful cues for finding the face and its pose under a wide range of poses, illuminations and expressions (PIE). Head orientation may vary in out of plane rotations up to ± 45°. 1. Introduction Automatic analysis of human faces is a vivid re- search field in computer vision. Applications involve face recognition for biometrics, facial expression analysis, video conferencing or human computer inter- action [1]. We are currently implementing a new application to observe and analyze single faces of post-operative patients using a passive high resolution stereo camera system. To meet the requirements for robustness exact localization of the face and its orientation is necessary in the first processing step. Challenges arise from the fact that the persons observed are non-cooperative. Consequently, the face processing system must be capable to deal with oblique head orientation including arbitrary rotation, changing conditions of illumination and shape variation caused by facial expressions. In literature these requirements are known as PIE (pose, illumination, expression) [1]. In the past, many techniques have been developed to detect and track faces and facial features. Appearance based methods were successfully used to detect faces in images on the basis of trained classifiers. Deformable Templates, Statistical Models, Active Appearance and Combined Shape Models are useful means for detection and tracking of facial regions [1]. Usually, edges, intensity maxima and features guide the matching process of such models. Interesting facial regions are the eyes, eyebrows, mouth and nose. However, to achieve biggest usability and minimize expenses and complexity many techniques attempt to fulfill face analysis with minimum requirements in input data, which is mo- nocular grayscale data. Nonetheless, color and stereo- scopic information, when available, are powerful cues for exactly locating the face and its pose and for feature analysis [2, 3]. Ref. [2] gives an overview about head pose estimation systems using stereo data. Some systems [4] work for large head rotations but require projection of structured light. Most approaches to 3D pose estimation are based on geometric features such as points or lines. Typically, this involves three steps: 1) feature extraction, 2) specification of correspondence between data and model features; 3) pose estimation from correspondence. However, extraction of reliable features is difficult and establishing correspondence can be complex. In this paper we present a simple and robust stereo based pose estimation method that solves the geometric feature extraction problem with help of a color based 3D point set partitioning. The set of feature points represents the face region and is processed to a coarse triangular mesh. Correspondence is then determined by registering the coarse mesh to a detailed face mesh model using a variant of the ICP algorithm [5]. The model is gained from a previous range scan of the per- son observed. Finally, pose and model information is used to normalize the face to a uniformed frontal view which itself simplifies facial feature extraction. 2. Method In contrast to the general face detection task ad- dressed by appearance based methods [1], single face analysis design predetermines that there is exactly one face in the focus of the camera. Hence, a faces/no faces classifier is not necessary, but the procedure for 0-7695-2521-0/06/$20.00 (c) 2006 IEEE