A machine learning approach to predict emotional arousal and valence from gaze extracted features Vasileios Skaramagkas * , Emmanouil Ktistakis *† , Dimitris Manousos * , Nikolaos S. Tachos ‡ , Eleni Kazantzaki * , Evanthia E. Tripoliti § , Dimitrios I. Fotiadis ‡§ and Manolis Tsiknakis *¶ * Institute of Computer Science, Foundation for Research and Technology Hellas (FORTH), GR-700 13 Heraklion, Crete, Greece, Email: vskaramag@ics.forth.gr † Laboratory of Optics and Vision, School of Medicine, University of Crete, GR-710 13 Heraklion, Crete, Greece ‡ Dept. of Biomedical Research, Institute of Molecular Biology and Biotechnology (FORTH), GR-451 10, Ioannina, Greece § Dept. of Materials Science and Engineering, Unit of Medical Technology and Intelligent Information Systems, University of Ioannina, GR-451 10, Ioannina, Greece ¶ Dept. of Electrical and Computer Engineering, Hellenic Mediterranean University, GR-710 04 Heraklion, Crete, Greece Abstract—In the last years, many studies have been inves- tigating emotional arousal and valence. Most of them have focused on the use of physiological signals such as EEG or EMG, cardiovascular measures or skin conductance. However, eye related features have proven to be very helpful and easy to use metrics, especially pupil size and blink activity. The aim of this study is to predict emotional arousal and valence levels which are induced during emotionally charged situations from eye related features. For this reason, we performed an experimental study where the participants watched emotion-eliciting videos and self-assessed their emotions, while their eye movements were being recorded. In this work, several classiﬁers such as KNN, SVM, Naive Bayes, Trees and Ensemble methods were trained and tested. Finally, emotional arousal and valence levels were predicted with 85 and 91% efﬁciency, respectively. I. I NTRODUCTION Among the various dimensional models of affect, the 2D arousal-valence emotion space of Russell [1] is the most commonly used one. Emotional valence describes the extent to which an emotion is positive or negative [2], whereas arousal refers to the level of calmness (i.e., low arousal) or excitation (i.e., high arousal) elicited by a stimulus [3]. Physiological signals combined with eye-related metrics are the most commonly used modality in order to estimate one’s emotional state [4]. However, there are several studies that have used eye features as the only predictor to identify emotional arousal and valence levels. These studies attempt to solve either multi-class [5], [6], [7] or binary classiﬁcation problems [8], [9] with success rates for multi-class cases remaining below 80%, while the binary classiﬁcation ap- proaches have proven to be more effective reaching up to 93% prediction success. Nevertheless, none of the aforementioned studies attempts to investigate in parallel the discrimination between the arousal and valence levels. A meta-analysis of the related studies has shown that the gaze extracted features that better indicate emotional arousal are pupil diameter and blink duration [3]. This work reports the results of a study in which participants watched emotion-evoking video clips during which, an eye tracker was used to capture eye motion and activity. The features extracted from all acquired gaze signals were used to train and evaluate a set of different classiﬁcation algorithms, including decision trees, discriminant analysis, support vector machine (SVM), k-nearest neighbors (kNN) and ensemble learning algorithms, aiming to accurately classify the various arousal and valence levels. II. PROTOCOL FOR DATA COLLECTION In the present study, 37 participants (22 female, 15 male) with mean age 29 (SD:7) years were enrolled. Binocular visual acuity at 80 cm was measured before each trial (mean VA: - 0.10±0.07 logMAR). Mean illuminance at cornea when screen was on, was 450 (SD: 24) lux. Fig. 1. The experimental setup 1