2018 International Conference on Intelligent Systems (IS) 978-1-5386-7097-2/18/$31.00 @2018 Crown Emotion Recognition using Spatiotemporal Features from Facial Expression Landmarks Hamid Golzadeh School of Engineering & Applied Science Aston University Birmingham, UK golzadeh@aston.ac.uk Anikó Ekárt School of Engineering & Applied Science Aston University Birmingham, UK a.ekart@aston.ac.uk Diego R. Faria School of Engineering & Applied Science Aston University Birmingham, UK d.faria@aston.ac.uk Christopher D. Buckingham School of Engineering & Applied Science Aston University Birmingham, UK c.d.buckingham@aston.ac.uk Luis J. Manso School of Engineering & Applied Science Aston University Birmingham, UK l.manso@aston.ac.uk Abstract—Emotion expression is a type of nonverbal communication (i.e. wordless cues) between people, where affect plays the role of interpersonal communication with information conveyed by facial and/or body expressions. Much can be understood about how people are feeling through their expressions, which are crucial for everyday communication and interaction. This paper presents a study on spatiotemporal feature extraction based on tracked facial landmarks. The features are tested with multiple classification methods to verify whether they are discriminative enough for an automatic emotion recognition system. The Karolinska Directed Emotional Faces (KDEF) [1] were used to determine features representing the human facial expressions of angry, disgusted, happy, sad, afraid, surprised and neutral. The resulting set of features were tested using K-fold cross-validation. Experimental results show that facial expressions can be recognised correctly with an accuracy of up to 87% when using the newly-developed features and a multiclass Support Vector Machine classifier. Keywords—Facial Expressions, Emotion Recognition, Spatiotemporal Features, Classification, SVM, RFC, SAG. I. INTRODUCTION Interactions among human beings are facilitated by interpretation of emotions, which can be expressed in many ways, including body language, voice intonation and facial expressions. Machine interpretation can use technologies such as electroencephalography to pick up emotions in voices [2] but there are easier practical methods for examining facial expressions. It is said that there are seven forms of human emotions that are recognisable in faces across different cultures [3]: disgust, contempt, happiness, anger, surprise, fear and sadness. Facial expression recognition (FER) therefore plays a very important role in improving the quality of human communications and can be usefully exploited by machines. For example, at airports, FER can be used as a method of security check to investigate unexpected emotional states of travellers or when investigating suspected criminals. FER can also have medical applications such as assessing the reactions of patients before or after surgery with respect to pain, stress, or anxiety. In mental health, computers can play an important role in gaining information from people who are reluctant to talk to a human due to stigma surrounding mental-health problems [4]. Avatars are becoming important components of e-mental health interventions [5] and can help improve engagement. One way is asking the right questions in the right way, which was a central motivation for developing the myGRaCE self-assessment version [6] of GRiST (www.egrist.org) for early detection of risks such as suicide and violence. However, virtual avatars also need to show appropriate emotional responses during interactions if they are to maximise therapeutic benefit and the research reported in this paper is an important step towards that goal. Automatic recognition of human emotions is a difficult task for at least the two following reasons: (i) a large database of training (labelled) images with realistic emotions (not acting) does not exist; (ii) static images that are a single point in time are not easy to classify with any confidence because facial expressions quickly change and the transitions between image frames are important pieces of information. In this paper, we present a study on facial expression recognition by which emotions are recognised automatically when using a dataset and that can be applied dynamically within live video. Multiple machine learning techniques such as Random Forest Classification (RFC), Support Vector Machines (SVM) and linear regression with Stochastic Average Gradient (SAG) have been tested. A set of spatiotemporal features based on tracked facial landmarks is presented and tested with multiple classifiers to verify its discriminability. Thus, the main contributions of the paper are highlighted as follows: - A new and effective set of spatiotemporal features based on 1D and 2D distances among facial landmarks, log-covariance, angles, derivatives, log-energy and angular velocities. - Experimental tests and comparison of multiple state-of-the-art machine learning algorithms for emotion classification to validate the effectiveness of the feature set selected from an affective facial expression dataset. A system has been developed by which human emotions can be detected in different lighting conditions, scenes and angles in real-time. The remainder of this paper is structured as follows.