DETECTING HUMAN BEHAVIOR EMOTIONAL CUES IN NATURAL INTERACTION
George Caridakis, Stylianos Asteriadis, Kostas Karpouzis and Stefanos Kollias
Intelligent Systems, Content and Interaction Lab
National Technical University of Athens
Iroon Polytexneiou 9, 15780 Zografou, Greece
gcari, stiast, kkarpou, stefanos@image.ntua.gr
ABSTRACT
Current work focuses on the detection of human behavior
emotional cues and their incorporation into affect aware Nat-
ural Interaction. Techniques for extracting emotional cues
based on visual non verbal human behavior are presented.
Namely, gesture qualitative expressivity features and head
pose and eye gaze estimation are derived from hand and fa-
cial movement respectively. Extracted emotional cues are
employed in expressive synthesis on virtual agents, based
on the analysis of actions performed by human users, in a
Human-Virtual Agent Interaction setting and in Assistive
Technologies aiming to infer in real time the degree of atten-
tion or frustration of children with reading difficulties.
Index Terms— Affective computing, Natural Interaction,
Gesture expressivity, Eye Gaze
1. INTRODUCTION
Affective computing has been a topic of great interest during
the last few years. Research has been performed in various
disciplines associated with interaction, including perception,
interpretation, cognition and expression. International con-
ferences have been organised on this topic, including ACII
series LREC workshops, recently IEEE Transactions on Af-
fective Computing (TAC) has published its first issues while
two new books have been published [1] and [2]. IST projects
and networks have been funded at European level for investi-
gating different issues of affective interaction, such as theories
and models of emotional processes, computational modeling,
emotional database, input signal analysis, emotion recogni-
tion, generation of embodied conversational agents; projects
like Interface, Ermis, Safira, Humaine, Semaine.
Various results have been obtained, in Europe and world-
wide (US, Asia) by different projects, researchers and indus-
try, regarding affective interaction. These mostly refer to the
derivation and analysis of affective and emotional theories
and related computational models, the extraction of affec-
tive cues from single or multi-sensorial inputs mainly aural
and visual the modeling of affective states, the analysis and
recognition of user states based on extracted cues, the gener-
ation of synthetic characters that communicate different ex-
pressive states and attitudes, the generation of databases with
affective interactions for training and testing the analysis and
synthesis techniques, the inclusion of the above in interactive
environments. The aforementioned activities have produced
a variety of systems that model and analyze single or multi-
modal affective cues; they have extracted and used statistical
information and rules for this purpose; they have created data
sets and environments which have been used next to perform
user state detection, Embodied Conversational Agent (ECA)
synthesis and interaction.
2. RELATED WORK
For exhaustive surveys of existing work in machine analy-
sis of affective expressions, readers are referred to [3] and
[4]. Recent advancements and research directions in Affec-
tive Computing are also discussed in [1] and [2].
As has been proven by an abundance of experimental
studies, incorporating multiple modalities into affective anal-
ysis systems enhances their performance and robustness; Au-
diovisual fusion can make use of complementary information
incorporated into these channels. Such reliability improve-
ment comes with the cost of introducing additional challenges
related to the multimodal aspect of affective analysis and syn-
thesis. Multimodal fusion techniques, synchronization issues
and absence or unreliability of information channels are chal-
lenges that are encountered most frequently. Information loss
during processing and feature extraction is fairly common in
naturalistic recordings, either due to technical implications or
due to uncontrolled user behavior. Fusing multiple modal-
ities alleviates such problems by combining multiple flows
of information at a feature level (early fusion) or at decision
level (late fusion). Combining early and late fusion hybrid or
ensemble techniques have been proposed recently [5]. The
architecture of affective analysis systems should cater for
input from multiple modalities, which vary in several aspects.
Much research work has been carried out on automatic
detection of basic, acted and extreme emotions recorded in
978-1-4577-0274-7/11/$26.00 ©2011 IEEE DSP2011