Facial Feature Extraction using Deformable Graphs and Statistical Pattern Matching Jörgen Ahlberg Jorgen.Ahlberg@utc.fr Laboratoire Heudiasyc UMR CNRS 6599, Université de Technologie de Compiègne, Centre de Recherches de Royallieu, BP 20529, FR-60205 Compiègne, France Abstract In model-based coding of image sequences containing human faces, e.g., videophone sequences, the detection and location of the face as well as the extraction of facial fea- tures from the images are crucial. The facial feature extraction can be regarded as a optimization problem, searching the optimum adaptation parameters of the model. The optimum is defined as the minimum distance between the extracted face and a face space. There are different approaches to reduce the computational complexity, and here a scheme using deformable graphs and dynamic programming is described. Experiments have been performed with promising results. I. MODEL-BASED CODING Since the major application of the techniques de- scribed in this document is model-based coding, an introduction to that topic will follow here. For more details, see [2, 9, 10, 14]. The basic idea of model-based coding of video se- quences is illustrated in Fig. 1. At the encoding side of a visual communication system (typically, a video- phone system), the image from the camera is ana- lysed, using computer vision techniques, and the relevant object(s), for example a human face, is iden- tified. A general or specific model is then adapted to the object, usually the model is a wireframe describ- ing the 3-D shape of the object. Instead of transmitting the full image pixel-by-pix- el, or by coefficients describing the waveform of the image, the image is handled as a 2-D projection of 3- D objects in a scene. To achieve this, parameters de- scribing the object(s) are extracted, coded and trans- mitted. Typical parameters are size, position and shape. To achieve acceptable visual similarity to the original image, the texture of the object is also trans- mitted. The texture can be compressed by some tradi- tional image coding technique, but specialized techniques lowering the bit-rate considerably for cer- tain applications have recently been published [15, 16]. At the receiver side of the system, the parameters are decoded and the decoder’s model is modified ac- cordingly. The model is then synthesized as a visual object using computer graphics techniques, e.g., a wireframe is shaped according to the shape and size parameters and the texture is mapped onto its surfac- es. In the following images, parameters describing the change of the model are transmitted. Typically, those parameters tell how to rotate and translate the model, and, in case of a non-rigid object like a human face, parameters describing motion of individual vertices of the wireframe are transmitted. This constitutes the largest gain of the model-based coding, since the mo- tion parameters can be transmitted at very low bit- rates [1]. Definitions for coding and representation of param- eters for model-based coding and animation of human faces are included in the newly set international standard MPEG-4 [12, 13]. Components of a Model-Based Coding System To encode an image sequence in a model-based scheme, we need first to detect and locate the face. This can be done by, e.g., colour discrimination, de- tection of elliptical objects using Hough-transforms, connectionist/neural network methods, or statistical pattern matching. Encoder Decoder Channel Model Model Original images Synthesized images Fig. 1: A Model-based coding system.