Affect Recognition in a Realistic Movie Dataset Using a Hierarchical Approach Joël Dumoulin, Diana Affi, Elena Mugellini, Omar Abou Khaled HumanTech Institute University of Applied Sciences Fribourg, Switzerland [name.surname]@hes-so.ch Marco Bertini, Alberto Del Bimbo MICC University of Florence Florence, Italy [name.surname]@unifi.it ABSTRACT Affective content analysis has gained great attention in re- cent years and is an important challenge of content-based multimedia information retrieval. In this paper, a hierar- chical approach is proposed for affect recognition in movie datasets. This approach has been verified on the AFEW dataset, showing an improvement in classification results compared to the baseline. In order to use all the visual sen- timent aspects contained in the movies excerpts of a realistic dataset such as FilmStim, deep learning features trained on a large set of emotional images are added to the standard audio and visual features. The proposed approach will be integrated in a system that communicates the emotions of a movie to impaired people and contribute to improve their television experience. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Informa- tion Search and Retrieval; H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing—Indexing Methods General Terms Algorithms; Experimentation Keywords Sentiment analysis; video analysis 1. INTRODUCTION People suffering of hearing or visual impairments are not able to fully enjoy a movie on television. More and more in- novative features are proposed by TV manufacturers (e.g. bet- ter screens, voice or gesture recognition, social apps), but these features are not improving their experience. Existing Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ASM’15, October 30, 2015, Brisbane, Australia. c 2015 ACM. ISBN 978-1-4503-3750-2/15/10 ...$15.00. DOI: http://dx.doi.org/10.1145/2813524.2813526. solutions like subtitles or audio-description help to under- stand the content of a movie, but they deliver static cogni- tive information lacking all the affective level, which is an important aspect of a movie. Building a system able to un- derstand the emotion contained in movies, and communicate it to the impaired viewers could bring them this missing in- formation, and contribute to improve their TV experience. In this paper we present a multimodal approach for affect recognition in movies, based on audio features and visual features that account for both content and sentiment. The proposed approach has been integrated in a first prototype to communicate the emotions of a movie in a multimodal way, using smart objects (e.g. smart lights, smartwatch). One channel is an ambient lighting system (see Fig. 1): the intensity of the emotion is mapped to the brightness and to each type of emotion corresponds to a light color. This paper shows that a hierarchical approach and the use of a deep CNNs model [1] trained on a large set of emotional images help in the affect recognition task on a realistic movie dataset. The rest of this paper is organized as follows: re- lated work is discussed in Section 2. The proposed emotion recognition system is described in Section 3. Experimental results on two movie datasets are discussed in Section 4, followed by conclusions and perspectives in Section 5. Features Classification Metadata: movie affect annotation + anger happiness Figure 1: Overall schema of the system for affect recognition and communication to viewers. The lighting system is used to represent the emotion and its intensity. 2. RELATED WORK Affective computing is one of the main challenges of content- based multimedia information retrieval [7]. Some works are focused on images [17] while others on sound [14]. When it comes to emotional analysis of videos, three main ap- proaches have been explored. One approach try to under- stand the emotions expressed by a person in a video, for instance by analyzing audio, visual and spontaneous expres- 15