Registration Invariant Representations for Expression Detection Patrick Lucey 1,2 , Simon Lucey 3 and Jeffrey F. Cohn 1,2 Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA 1 Department of Psychology, Universityof Pittsburgh, Pittsburgh, PA, USA 2 ICT Center, CSIRO, Sydney, Australia 3 plucey@pitt.edu, simon.lucey@csiro.au, jeffcohn@cs.cmu Abstract Active appearance model (AAM) representations have been used to great effect recently in the accurate detection of expression events (e.g., action units, pain, broad expres- sions, etc.). The motivation for their use, and rationale for their success, lies in their ability to: (i) provide dense (i.e. 60- 70 points on the face) registration accuracy on par with a human labeler, and (ii) the ability to decompose the registered face image to separate appearance and shape representations. Unfortunately, this human-like registration performance is isolated to registration algorithms that are speciﬁcally tuned to the illumination, camera and subject being tracked (i.e. “subject dependent” algorithms). As a result, it is rare, to see AAM representations being employed in the far more useful “subject independent” situations (i.e., where illumination, camera and subject is unknown) due to the inherent increased geometric noise present in the esti- mated registration. In this paper we argue that “AAM like” expression detection results can be obtained in the presence of noisy dense registration through the employment of reg- istration invariant representations (e.g., Gabor magnitudes and HOG features). We demonstrate that good expression detection performance can still be enjoyed over the types of geometric noise often encountered with the more geo- metrically noisy state of the art generic algorithms (e.g., Bayesian Tangent Shape Models (BTSM), Constrained Lo- cal Models (CLM), etc). We show these results on the ex- tended Cohn-Kanade (CK+) database over all facial action units. 1. Introduction Central to the success of an automatic facial expression detector is the face alignment/registration algorithm and the visual features derived from it. As expressions can be sub- tle, high accuracy is desired so that the correspondence be- tween various facial features and muscles contracting and Figure 1. This ﬁgure depicts AAM representations employed in the current state-of-the-art expression detection algorithms. Col- umn (a) depicts the initial scenario in which all shape and appear- ance is preserved. In (b) geometric similarity is removed from both the shape and appearance; and in (c) shape (including similarity) has been removed leaving the average face shape and what we re- fer to as the the canonical appearance. Features derived from the representations in columns (b) and (c) are used in AAM expres- sion detection systems. Two central questions addressed in this paper are: (i) how sensitive are AAM representations to registra- tion noise, and (ii) are their alternate representations that can give greater invariance? controlling the face can be maintained, enhancing the abil- ity of a classiﬁer to detect the facial expression correctly. To facilitate this, active appearance models (AAMs) [7] have been widely used in the ﬁeld of affective computing as they provide dense registration accuracy (i.e. 60-70 points on the face) so that these correspondences are kept allowing com- parisons of the relevant areas to be performed [2, 3, 13, 15]. It has been well established [2, 15] when performing ex- pression detection using AAM derived representations (i.e. decoupled shape and appearance features, see Figure 1) that: (i) dense registration is preferable to coarse regis- tration, and (ii) improved alignment accuracy is correlated with improved detection performance. This is a desired re- sult if we have an automatic dense face alignment algorithm that can exhibit “human like” accuracy (i.e. performance 1