EXPRESSIVE AVATARS IN MPEG-4 M. Mancini, B. Hartmann, C. Pelachaud A. Raouzaiou, K. Karpouzis IUT of Montreuil - University of Paris 8 Image, Video and Multimedia Systems Labora- tory, Natl Technical University of Athens {m.mancini, c.pelachaud}@iut.univ-paris8.fr {araouz, kkarpou}@image.ece.ntua.gr ABSTRACT Man-Machine Interaction (MMI) Systems that utilize mul- timodal information about users’ current emotional state are presently at the forefront of interest of the computer vision and artificial intelligence communities. A lifelike avatar can enhance interactive applications. In this paper, we present the implementation of GretaEngine and syn- thesized expressions, including intermediate ones, based on MPEG-4 standard and Whissel’s Emotion Representa- tion. 1. INTRODUCTION Research in facial expression analysis and synthesis has mainly concentrated on archetypal emotions. In particular, sadness, anger, joy, fear, disgust and surprise are catego- ries of emotions that attracted most of the interest in hu- man computer interaction environments. Moreover, the MPEG-4 indicates an alternative way of modeling facial expressions and the underlying emotions, which is strongly influenced from neurophysiological and psycho- logical studies (FAPs). The adoption of token-based ani- mation in the MPEG-4 framework [1] benefits the defini- tion of emotional states, since the extraction of simple, symbolic parameters is more appropriate to analyze, as well as synthesize facial expression and hand gestures. In this paper we describe the implementation of Gre- taEngine and an approach to synthesize expressions, in- cluding intermediate ones, via the tools provided in the MPEG-4 standard based on real measurements and on universally accepted assumptions of their meaning, taking into account results of Whissel’s study [1]. The results of the synthesis process can then be applied to avatars, so as to convey the communicated messages more vividly than plain textual information or simply to make interaction more lifelike. 2. EMOTION REPRESENTAION The obvious goal for emotion analysis applications is to assign category labels that identify emotional states. How- ever, labels as such are very poor descriptions, especially since humans use a daunting number of labels to describe emotion. Activation-evaluation space [3] is a representation that is both simple and capable of capturing a wide range of significant issues in emotion. A basic attraction of that arrangement is that it provides a way of describing emo- tional states which is more tractable than using words, but which can be translated into and out of verbal descrip- tions. Translation is possible because emotion-related words can be understood, at least to a first approximation, as referring to positions in activation-emotion space. Vari- ous techniques lead to that conclusion, including factor analysis, direct scaling, and others. 3. FACIAL EXPRESSION IN MPEG-4 3.1. Modeling Primary Expressions Using Motion Cap- ture Data We currently use a system based on key frame animation, where an expression is defined by 3 temporal parameters, namely onset, apex and offset. But such a specification does not allow one to capture the subtlety of facial expres- sion dynamism. In order to improve these animations, we are studying real data of facial movements coming from motion capture sequences (we are very grateful to Franck Multon of University of Rennes 2 for the recording of the motion captured data) we have recorded using an Oxford Metrics Vicon system (www.vicon.com ). Our data are organized into 78 sequences performed by two actors, a man and a woman, each having 33 markers on the face, 21 of which correspond to FAPs (Facial Animation Parame- ter) locations. These sequences are simple basic move- ments, like raising eyebrows or smiling, and basic emo- tions such as anger, happiness, surprise. Finally we re- corded two sequences of monologues in which extreme expressions of emotions were displayed. 0-7803-9332-5/05/$20.00 ©2005 IEEE