7 Abstract Recently, recognizing affects from both face and body gestures attracts more attentions. However, it still lacks of efficient and effective features to describe the dynamics of face and gestures for real-time automatic affect recognition. In this paper, we propose a novel approach, which combines both MHI-HOG and Image-HOG through temporal normalization method, to describe the dynamics of face and body gestures for affect recognition. The MHI-HOG stands for Histogram of Oriented Gradients (HOG) on the Motion History Image (MHI). It captures motion direction of an interest point as an expression evolves over the time. The Image-HOG captures the appearance information of the corresponding interesting point. Combination of MHI-HOG and Image-HOG can effectively represent both local motion and appearance information of face and body gesture for affect recognition. The temporal normalization method explicitly solves the time resolution issue in the video-based affect recognition. Experimental results demonstrate promising performance as compared with the state of the art. We also show that expression recognition with temporal dynamics outperforms frame-based recognition. 1. Introduction Automatic affective computing has attracted increasingly attention from psychology, cognitive science, and computer science communities due to its importance in practice for a wide range of applications, including intelligent human computer interaction, law enforcement, and entertainment industries etc. Many algorithms and systems have been proposed in the past for automatic facial expression recognition. Generally, these methods can be categorized into two categories: image-based approaches and video-based approaches. Lanitis et al. [12] performed statistical analysis on static face images to model complicated facial expression. The model captures both shape and appearance features of facial expressions by considering different sources of variations, such as lighting changes, different person identity etc. Guo and Dyer [10] applied Gabor filter and large margin classifiers to recognize facial expressions from face images as well. Both papers classify face images into six basic universal expressions. Tian et al. [17] combined both geometry and appearance features to recognize action units (AUs) of the Facial Action Coding System (FACS), which are proposed by Ekman and Friesen [7]. Temporal dynamics of facial expressions is crucial for facial behavior interpretation [14]. In order to incorporate expression dynamics for affect recognition, several researchers have explicitly segment expressions into neutral, onset, offset and apex phases. Figure 1 shows the temporal dynamics of a “Happiness” expression. Chen et al. [4] employ Support Vector Machine (SVM) to temporally segment an expression into neutral, onset, apex, and offset phases by fusion of both motion area and neutral divergence features. Pantic and Patras [13] apply rule-based method to temporally segment AU into onset, apex, and offset phases from face profile image sequence, and then select the expressive frames for AU recognition. Tong et al. [18] employ a dynamic Bayesian network (DBN) to systematically account for temporal evolutions for facial action unit recognition. Shan et al. [15] apply spatial temporal interest points to describe body gesture for video based affect recognition. Yang et al. [19] extract similarity features from onset to apex frames for facial expression recognition. Their dynamic binary coding method implicitly embedded time warping operation to handle the time resolution issue for video-based affect recognition. Inspired by psychology studies [1], which show that both face and body gesture carry significant amount of affect Recognizing Expressions from Face and Body Gesture by Temporal Normalized Motion and Appearance Features Shizhi Chen and YingLi Tian Department of Electrical Engineering The City College of New York New York NY, USA {schen21, ytian}@ccny.cuny.edu Qingshan Liu and Dimitris N. Metaxas Department of Computer Science Rutgers University Piscataway NJ, USA {qsliu, dnm}@cs.rutgers.edu Figure 1: The temporal dynamics of the expression of “Happiness”.