Under review, Journal of Robotics and Autonomous Systems Beyond Prototypic Expressions: Discriminating Subtle Changes in the Face James Jenn-Jier Lien and Takeo Kanade Robotics Institute, Carnegie Mellon University Jeffrey F. Cohn Department of Psychology, University of Pittsburgh Robotics Institute, Carnegie Mellon University Ching-Chung Li Department of Electrical Engineering, University of Pittsburgh Corresponding author: Jeffrey F. Cohn, Clinical Psychology Program, 4015 O'Hara Street, Pittsburgh, PA 15260, USA. Email: jeffcohn@vms.cis.pitt.edu Abstract Current approaches to automated analysis focus on a small set of prototypic expressions (e.g., joy). Prototypic expressions occur infrequently, and emotion expression is more varied. To capture the full range of emotion expression, recognition of fine-grained changes in facial expression is needed. We developed and implemented a computer vision system that is sensitive to subtle changes in the face. Three convergent modules extract feature information and recognize FACS action units using discriminant analysis or hidden Markov models. The modules are feature-point tracking, dense-flow tracking with principal component analysis, and high- gradient component detection. Each module demonstrated high concurrent validity with manual FACS coding. 1 Introduction Most computer-vision-based approaches to facial expression analysis [e.g., 3,25] attempt to recognize only a small set of prototypic expressions of emotion. This focus follows from the work of Darwin [10] and more recently Ekman [13] and Izard et al.[21] who proposed that “basic emotions” (i.e., joy, surprise, anger, sadness, fear, and disgust) each have a prototypic facial expression. These expressions all involve changes in facial features in multiple regions of the face, which facilitates analysis. In everyday life prototypic expressions occur relatively infrequently and emotion more often is communicated by changes in one or two discrete features, such as tightening the lips in anger [5]. Change in isolated features, especially the brows or eyelids, also is typical of paralinguistic displays (such as raising the brows to signal greeting). To capture the subtlety of human emotion and paralinguistic communication, automated recognition of fine-grained changes in facial expression is needed. The Facial Action Coding System (FACS) [15] is a human-observer-based system designed to detect subtle changes in facial features. Using FACS and viewing videotaped facial behavior in slow motion, trained observers can manually code all possible facial displays, which are referred to as “action units” (AU's). More than 7,000 combinations have been observed [12]. Ekman and Friesen [16] proposed that specific combinations of FACS action units represent prototypic expressions of emotion (i.e., joy, sadness, anger, disgust, fear, and surprise). Emotion expressions, however, are not part of FACS; they are coded in separate systems, such as EMFACS [19]. FACS itself is purely descriptive, uses no emotion or other inferential labels, and provides the necessary ground truth with which to describe facial expression. 1.1 Principal component analysis of face images Several approaches to automated facial feature extraction and recognition have proven promising. One approach, initially developed for face recognition, uses principal component analysis (PCA) of face images and artificial neural networks. Face images are represented as linear combinations of a set of "eigenfaces" [34]. The combination coefficients are input to an artificial neural network or other classifier. Using this approach, Padgett, Cottrell, and Adolphs [27] recognized 86% of six prototypic emotion expressions as defined by Ekman (i.e., joy,