DETECTING HUMAN BEHAVIOR EMOTIONAL CUES IN NATURAL INTERACTION George Caridakis, Stylianos Asteriadis, Kostas Karpouzis and Stefanos Kollias Intelligent Systems, Content and Interaction Lab National Technical University of Athens Iroon Polytexneiou 9, 15780 Zografou, Greece gcari, stiast, kkarpou, stefanos@image.ntua.gr ABSTRACT Current work focuses on the detection of human behavior emotional cues and their incorporation into affect aware Nat- ural Interaction. Techniques for extracting emotional cues based on visual non verbal human behavior are presented. Namely, gesture qualitative expressivity features and head pose and eye gaze estimation are derived from hand and fa- cial movement respectively. Extracted emotional cues are employed in expressive synthesis on virtual agents, based on the analysis of actions performed by human users, in a Human-Virtual Agent Interaction setting and in Assistive Technologies aiming to infer in real time the degree of atten- tion or frustration of children with reading difﬁculties. Index Terms— Affective computing, Natural Interaction, Gesture expressivity, Eye Gaze 1. INTRODUCTION Affective computing has been a topic of great interest during the last few years. Research has been performed in various disciplines associated with interaction, including perception, interpretation, cognition and expression. International con- ferences have been organised on this topic, including ACII series LREC workshops, recently IEEE Transactions on Af- fective Computing (TAC) has published its ﬁrst issues while two new books have been published [1] and [2]. IST projects and networks have been funded at European level for investi- gating different issues of affective interaction, such as theories and models of emotional processes, computational modeling, emotional database, input signal analysis, emotion recogni- tion, generation of embodied conversational agents; projects like Interface, Ermis, Saﬁra, Humaine, Semaine. Various results have been obtained, in Europe and world- wide (US, Asia) by different projects, researchers and indus- try, regarding affective interaction. These mostly refer to the derivation and analysis of affective and emotional theories and related computational models, the extraction of affec- tive cues from single or multi-sensorial inputs mainly aural and visual the modeling of affective states, the analysis and recognition of user states based on extracted cues, the gener- ation of synthetic characters that communicate different ex- pressive states and attitudes, the generation of databases with affective interactions for training and testing the analysis and synthesis techniques, the inclusion of the above in interactive environments. The aforementioned activities have produced a variety of systems that model and analyze single or multi- modal affective cues; they have extracted and used statistical information and rules for this purpose; they have created data sets and environments which have been used next to perform user state detection, Embodied Conversational Agent (ECA) synthesis and interaction. 2. RELATED WORK For exhaustive surveys of existing work in machine analy- sis of affective expressions, readers are referred to [3] and [4]. Recent advancements and research directions in Affec- tive Computing are also discussed in [1] and [2]. As has been proven by an abundance of experimental studies, incorporating multiple modalities into affective anal- ysis systems enhances their performance and robustness; Au- diovisual fusion can make use of complementary information incorporated into these channels. Such reliability improve- ment comes with the cost of introducing additional challenges related to the multimodal aspect of affective analysis and syn- thesis. Multimodal fusion techniques, synchronization issues and absence or unreliability of information channels are chal- lenges that are encountered most frequently. Information loss during processing and feature extraction is fairly common in naturalistic recordings, either due to technical implications or due to uncontrolled user behavior. Fusing multiple modal- ities alleviates such problems by combining multiple ﬂows of information at a feature level (early fusion) or at decision level (late fusion). Combining early and late fusion hybrid or ensemble techniques have been proposed recently [5]. The architecture of affective analysis systems should cater for input from multiple modalities, which vary in several aspects. Much research work has been carried out on automatic detection of basic, acted and extreme emotions recorded in 978-1-4577-0274-7/11/$26.00 ©2011 IEEE DSP2011