BIOLOGICALLY VS. LOGIC INSPIRED ENCODING OF FACIAL ACTIONS AND EMOTIONS IN VIDEO M.F. Valstarand M. Pantic Computing Department, Imperial College London, UK {M.F.Valstar, M.Pantic}@imperial.ic.ac.uk Abstract Automatic facial expression analysis is an important aspect of Human Machine Interaction as the face is an important communica- tive medium. We use our face to signal interest, disagreement, inten- tions or mood through subtle facial motions and expressions. Work on automatic facial expression analysis can roughly be divided into the recognition of prototypic facial expressions such as the six basic emotional states and the recognition of atomic facial muscle actions (Action Units, AUs). Detection of AUs rather than emotions makes facial expression detection independent of culture-dependent inter- pretation, reduces the dimensonality of the problem and reduces the amount of training data required. Classic psychological studies sug- gest that humans consciously map AUs onto the basic emotion cat- egories using a finite number of rules. On the other hand, recent studies suggest that humans recognize emotions unconsciously with a process that is perhaps best modeled by artificial neural networks (ANNs). This paper investigates these two claims. A comparison is made between detection of emotions directly from features vs a two-step approach where we first detect AUs and use the AUs as input to either a rulebase or an ANN to recognize emotions. The re- sults suggest that the two-step approach is possible with a small loss of accuracy and that biologically inspired classification techniques outperfrom those that approach the classification problem from a logical perspective, suggesting that biologically inspired classifiers are more suitable for computer-based analysis of facial behaviour than logic inspired methods. 1. INTRODUCTION The ability to detect and understand facial expressions and other social signals of someone with whom we are communicating is the core of social and emotional intelligence. Human Machine Interac- tion systems capable of sensing stress, inattention and heedfulness and are able to adapt and respond to these affective states of users are likely to be perceived as more natural, efficacious and trustworthy. But what exactly is an affective state? Traditionally the terms “af- fect” and “emotion” have been used synonymously. Following Dar- win, discrete emotion theorists propose the existence of six or more basic emotions that are universally displayed and recognized [8]. These include emotions such as happiness, anger, sadness, surprise, disgust and fear. Data from both modern Western and traditional so- cieties suggest that non-verbal communicative signals (especially fa- cial expressions) involved in these basic emotions are displayed and recognized cross-culturally [8]. However, in real life people show far more expressions, such as ’boredom’ or ’I don’t know’. There is much less evidence that these subtler expressions are universally displayed and interpreted as well. Table 1. Rules for mapping Action Units to emotions, according to the FACS investigators guide. A||B means “either A or B”. Emotion AUs Emotion AUs Happy {12} Fear {1,2,4} {6,12} {1,2,4,5,20, Sadness {1,4} 25||26||27} {1,4,11||15} {1,2,4,5,25||26||27} {1,4,15,17} {1,2,4,5} {6,15} {1,2,5,25||26||27} {11,17} {5,20,25||26||27} {1} {5,20} Surprise {1,2,5,26||27} {20} {1,2,5} Anger {4,5,7,10,22,23,25||26} {1,2,26||27} {4,5,7,10,23,25||26} {5,26||27} {4,5,7,17,23||24} Disgust {9||10,17} {4,5,7,23||24} {9||10,16,25||26} {4,5||7} {9||10} {17,24} Instead of directly classifying facial expressions into a finite number of basic emotion classes, we could also try to recognize the underlying facial muscle activities and then interpret these in terms of arbitrary categories such as emotions, attitudes or moods [11]. The Facial Action Coding System (FACS) [4] is the best known and the most commonly used system developed for human observers to describe facial activity in terms of visually observable facial mus- cle actions (i.e., Action Units, AUs). Using FACS, human observers uniquely decompose a facial expression into one or more of in total 44 AUs that produced the expression in question. Classic psychological studies like the EMFACS (emotional FACS), suggest that it is possible to map AUs onto the basic emo- tion categories using a finite number of rules (as suggested in the FACS investigators guide [4], table 1). This effectively suggests that facial expressions are decoded at a conscious level of aware- ness. Alternative studies, like the one on “the thin slices of be- haviour” [1], suggest that human expressive nonverbal cues such as facial expressions are neither encoded nor decoded at an intentional, conscious level of awareness. In turn, this finding suggests that bio- logically inspired classification techniques like artificial neural net- works (ANNs) may prove more suitable for tackling the problem of (basic) emotion recognition from AUs as such techniques emulate human unconscious problem solving processes in contrast to rule- based techniques, which are inspired by human conscious problem solving processes. Recent work on emotion detection using biologically inspired al- gorithms has used ANNs [5], SVMs [2], Bayesian Networks [3, 16] and Hidden Markov Models (HMMs) [3]. Recent work on facial