Weight-Based Facial Expression Recognition from Near-Infrared Video Sequences Matti Taini, Guoying Zhao, and Matti Pietik¨ ainen Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, P. O. Box 4500 FI-90014 University of Oulu, Finland {mtaini,gyzhao,mkp}@ee.oulu.fi Abstract. This paper presents a novel weight-based approach to rec- ognize facial expressions from the near-infrared (NIR) video sequences. Facial expressions can be thought of as speciﬁc dynamic textures where local appearance and motion information need to be considered. The face image is divided into several regions from which local binary pat- terns from three orthogonal planes (LBP-TOP) features are extracted to be used as a facial feature descriptor. The use of LBP-TOP features en- ables us to set diﬀerent weights for each of the three planes (appearance, horizontal motion and vertical motion) inside the block volume. The performance of the proposed method is tested in the novel NIR facial expression database. Assigning diﬀerent weights to the planes according to their contribution improves the performance. NIR images are shown to deal with illumination variations comparing with visible light images. Key words: Local binary pattern, region based weights, illumination invariance, support vector machine 1 Introduction Facial expression is natural, immediate and one of the most powerful means for human beings to communicate their emotions and intentions, and to interact socially. The face can express emotion sooner than people verbalize or even realize their feelings. To really achieve eﬀective human-computer interaction, the computer must be able to interact naturally with the user, in the same way as human-human interaction takes place. Therefore, there is a growing need to understand the emotions of the user. The most informative way for computers to perceive emotions is through facial expressions in video. A novel facial representation for face recognition from static images based on local binary pattern (LBP) features divides the face image into several regions (blocks) from which the LBP features are extracted and concatenated into an enhanced feature vector [1]. This approach has been used successfully also for facial expression recognition [2], [3], [4]. LBP features from each block are ex- tracted only from static images, meaning that temporal information is not taken into consideration. However, according to psychologists, analyzing a sequence of images leads to more accurate and robust recognition of facial expressions [5].