EXPRESSIVE AVATARS IN MPEG-4
M. Mancini, B. Hartmann, C. Pelachaud A. Raouzaiou, K. Karpouzis
IUT of Montreuil - University of Paris 8 Image, Video and Multimedia Systems Labora-
tory, Natl Technical University of Athens
{m.mancini, c.pelachaud}@iut.univ-paris8.fr {araouz, kkarpou}@image.ece.ntua.gr
ABSTRACT
Man-Machine Interaction (MMI) Systems that utilize mul-
timodal information about users’ current emotional state
are presently at the forefront of interest of the computer
vision and artificial intelligence communities. A lifelike
avatar can enhance interactive applications. In this paper,
we present the implementation of GretaEngine and syn-
thesized expressions, including intermediate ones, based
on MPEG-4 standard and Whissel’s Emotion Representa-
tion.
1. INTRODUCTION
Research in facial expression analysis and synthesis has
mainly concentrated on archetypal emotions. In particular,
sadness, anger, joy, fear, disgust and surprise are catego-
ries of emotions that attracted most of the interest in hu-
man computer interaction environments. Moreover, the
MPEG-4 indicates an alternative way of modeling facial
expressions and the underlying emotions, which is
strongly influenced from neurophysiological and psycho-
logical studies (FAPs). The adoption of token-based ani-
mation in the MPEG-4 framework [1] benefits the defini-
tion of emotional states, since the extraction of simple,
symbolic parameters is more appropriate to analyze, as
well as synthesize facial expression and hand gestures.
In this paper we describe the implementation of Gre-
taEngine and an approach to synthesize expressions, in-
cluding intermediate ones, via the tools provided in the
MPEG-4 standard based on real measurements and on
universally accepted assumptions of their meaning, taking
into account results of Whissel’s study [1]. The results of
the synthesis process can then be applied to avatars, so as
to convey the communicated messages more vividly than
plain textual information or simply to make interaction
more lifelike.
2. EMOTION REPRESENTAION
The obvious goal for emotion analysis applications is to
assign category labels that identify emotional states. How-
ever, labels as such are very poor descriptions, especially
since humans use a daunting number of labels to describe
emotion.
Activation-evaluation space [3] is a representation
that is both simple and capable of capturing a wide range
of significant issues in emotion. A basic attraction of that
arrangement is that it provides a way of describing emo-
tional states which is more tractable than using words, but
which can be translated into and out of verbal descrip-
tions. Translation is possible because emotion-related
words can be understood, at least to a first approximation,
as referring to positions in activation-emotion space. Vari-
ous techniques lead to that conclusion, including factor
analysis, direct scaling, and others.
3. FACIAL EXPRESSION IN MPEG-4
3.1. Modeling Primary Expressions Using Motion Cap-
ture Data
We currently use a system based on key frame animation,
where an expression is defined by 3 temporal parameters,
namely onset, apex and offset. But such a specification
does not allow one to capture the subtlety of facial expres-
sion dynamism. In order to improve these animations, we
are studying real data of facial movements coming from
motion capture sequences (we are very grateful to Franck
Multon of University of Rennes 2 for the recording of the
motion captured data) we have recorded using an Oxford
Metrics Vicon system (www.vicon.com ). Our data are
organized into 78 sequences performed by two actors, a
man and a woman, each having 33 markers on the face, 21
of which correspond to FAPs (Facial Animation Parame-
ter) locations. These sequences are simple basic move-
ments, like raising eyebrows or smiling, and basic emo-
tions such as anger, happiness, surprise. Finally we re-
corded two sequences of monologues in which extreme
expressions of emotions were displayed.
0-7803-9332-5/05/$20.00 ©2005 IEEE