EXPERIMENTS WITH FACIAL EXPRESSION RECOGNITION USING SPATIOTEMPORAL LOCAL BINARY PATTERNS Guoying Zhao and Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, P. O. Box 4500 FI-90014 University of Oulu, Finland E-mail: {gyzhao, mkp}@ee.oulu.fi ABSTRACT In this paper, the recently introduced method for facial expression recognition using spatiotemporal local binary patterns is reviewed and experiments are carried out to investigate the robustness of the approach. In experiments with the Cohn-Kanade facial expression database, our results from the cross-validation with low resolutions and low frame rates are promising. Advantages of our approach include local processing, robustness to low quality of videos and simple computation. 1. INTRODUCTION A goal of facial expression recognition is to determine the emotional state of the face, e.g. happiness, sadness, surprise, neutral, anger, fear, and disgust, regardless of the identity of the face. The face can express emotions sooner than people verbalize or even realize their feelings [1], and research in social psychology has shown that facial expressions form the major modality in human communication [2], So facial expression is one of the most powerful, natural and immediate means for human beings to communicate their emotions and intentions [3]. Even though much work has been done, recognizing facial expression with a high accuracy remains to be difficult due to the complexity and variety of facial expressions. Pantic and Rothkrantz [4] surveyed the work done in automating facial expression analysis in facial images and image sequences. In another survey by Fasel and Luettin [5], the most prominent automatic facial expression analysis methods and systems were introduced. They also discussed some facial motion and deformation extraction approaches as well as classification methods. According to psychologists [6], analysis of sequences of images produces more accurate and robust recognition of facial expressions than using only single frames. Psychological studies have suggested that the facial motion is fundamental to the recognition of facial expressions. Experiments conducted by Bassili [6] demonstrate that the humans do better job in recognizing expressions from dynamic images as opposed to mug shot. For using dynamic information to analyze facial expressions, several systems attempt to recognize fine- grained changes in facial expression based on the Facial Action Coding System (FACS) which was developed by Ekman and Friesen [7] for describing facial expressions by action units (AUs), for instance [1, 8-10]. Some other papers attempt to recognize a small set of prototypic emotional expressions, i.e. joy, surprise, anger, sadness, fear, and disgust. Our work focuses on the latter one. Yeasin et al. [11] applied the horizontal and vertical components of the optic flow as features. At the frame level, a k-NN rule was used to derive characteristic temporal signature for every video sequence. At the sequence level, discrete HMMs were trained to recognize the temporal signatures associated with each basic expression. This method cannot deal with the illumination variation, however. Manglik et al. [12] presented a method for extracting position of the eyes, eyebrows and mouth, then determining the cheek and forehead regions. The optical flow procedure was applied to these regions and the resulting vertical optical flow values were fed to the discrete Hopfield network. Their dataset only included 20 samples, obtaining a result of 79.8%. Aleksic and Katsaggelos [13] exploited Facial Animation Parameters as features describing facial expressions, and utilized multi-stream Hidden Markov Models for recognition. The system is complex, thus difficult to perform in real-time. Cohen et al. [14] introduced a Tree- Augmented-Naive Bayes classifier for recognition, but they only experimented on a set of five people, and accuracy is only around 65% for person-independent evaluation. Recently, the block-based approach based on local binary patterns (LBP), originally developed for single face images [16], was extended for the recognition of specific dynamic events such as facial expressions [17]. In the present paper, we review this approach and carry out additional experiments with low resolution images. We also do cross validation between different resolutions and frame rates to evaluate the performance of the proposed approach. 2. SPATIOTEMPORAL LOCAL BINARY PATTERNS Local texture descriptors have gained increasing attention in facial image analysis due to their robustness to challenges 1091 1-4244-1017-7/07/$25.00 ©2007 IEEE ICME 2007