Neural Architecture for Temporal Emotion Classification Roland Schweiger, Pierre Bayerl, and Heiko Neumann Universit¨ at Ulm, Neuroinformatik, Germany {roland.schweiger,pierre.bayerl,heiko.neumann}@informatik.uni-ulm.de Abstract. In this pilot study, a neural architecture for temporal emotion recog- nition from image sequences is proposed. The investigation aims at the develop- ment of key principles in an extendable experimental framework to study human emotions. Features representing temporal facial variations were extracted within a bounding box around the face that is segregated into regions. Within each re- gion, the optical flow is tracked over time. The dense flow field in a region is subsequently integrated whose principal components were estimated as a repre- sentative velocity of face motion. For each emotion a Fuzzy ARTMAP neural network was trained by incremental learning to classify the feature vectors re- sulting from the motion processing stage. Single category nodes corresponding to the expected feature representation code the respective emotion classes. The architecture was tested on the Cohn-Kanade facial expression database. 1 Introduction The automated analysis of human behavior by means of computational vision tech- niques is a research topic that gained increased attention. Several approaches were pro- posed. For example, Mase [1] utilized the Facial Action Coding System (FACS) to de- scribe expressions based on the extracted muscle motions. Bascle et al. [2] tracked facial deformations by means of face templates generated from B-spline curves. Key-frames were selected to represent basic face expressions. Most similar to our own approach, Essa and Pentland [3] extracted spatio-temporal energy of facial deformations from im- age sequences that define dense templates of expected motions. Observed expressions of a human face were classified according to the most similar average motion pattern using a Bayesian classifier. Unlike previous approaches, we propose a neural network architecture that aims at a framework for emotion recognition based on integrated velocities (amount and direction of motion) in different sectors of a human face. We introduce a simple frame- work for fast incremental neural network learning to classify different emotions. The architecture is extendable to serve as a tool of experimental investigation. For example, the architecture is flexible to allow the incorporation of features that represent tempo- ral coordination of emotions. In this pilot study, we utilize a supervised principle of incremental allocation of categories to represent different emotions. We evaluate the proposed network using a database of image sequences from facial expressions [4] and demonstrate the discriminative power of the network.