Characters Identiﬁcation in TV series Madjid Maidi, Veronica Scurtu and Marius Preda ARTEMIS Department, Telecom SudParis/Institut TELECOM 9, Rue Charles Fourier, 91011 Evry, France {madjid.maidi,veronica.scurtu,marius.preda}@it-sudparis.eu Abstract—This work aims to realize a recognition system for a software engine that will automatically generate a quiz starting from a video content and reinsert it into the video, turning thus any available foreign-language video (such as news or TV series) into a remarkable learning tool. Our system includes a face tracking application which integrates the eigenface method with a temporal tracking approach. The main part of our work is to detect and identify faces from movies and to associate speciﬁc quizzes for each recognized character. The proposed approach allows to label the detected faces and maintains face tracking along the video stream. This task is challenging since characters present signiﬁcant variation in their appearance. Therefore, we employed eigenfaces to reconstruct the original image from training models and we developed a new technique based on frames buffering for continuous tracking in unfavorable environment conditions. Many tests were conducted and proved that our system is able to identify multiple characters. The obtained results showed the performance and the effectiveness of the proposed method. Index Terms—face recognition; Principal Component Analysis; Linear Discriminant Analysis; temporal tracking; I. I NTRODUCTION In a world where digitalized content and more developed internet-based networks have induced an acceleration of the production and distribution of media content and an increase in the individualization and personalization of media consump- tion, the access to knowledge and education has also gained a global dimension in terms of agent, time and location. Leading fast digital lives, the young prefer a learning method with a game-like interface, intuitive and interactive. Foreign language learning is no exception in this respect, and new educational content, more efﬁcient and more entertaining at the same time, must be created. Our approach, which will ﬁll part of the gap between old-school educational formats based on the ”communicative approach” [1] and the youth’s expectations, consists in the development of a demonstrator that turns any available foreign language video into an attractive and interactive learning tool for the language in question, combining the comprehensive approach [2] and the development of listening comprehension [3] competencies. Since the impact of educational games on students’ results in English has been proved to be positive, as a 2007 Japanese study [4] showed in the case of the Nintendo DS ”English training” game (1 million units sold in Japan) for junior high school students, our system is designed to achieve regular practice by combining the interactivity of the quiz and the game-like scoring system. Thus, it will be perceived, especially by younger users, as entertainment, rather than education. The novelty that brings our work is in proposing a new interactive, individualized and motivating solution to learn a foreign language while watching and enjoying one’s favourite original language video on a media device (laptop, PC, mobile phone,...). The system has as an entry a video ﬁle and the associated subtitle ﬁle. On the server side, the plugins for face recognition and analysis of the text are executed in order to generate the quiz and the associated answers. The exit of the execution chain is an enriched video that consists of the initial video, the subtitles and the associated quiz. In order to achieve its goal, our solution brings together ar- eas of expertise that rarely work together: ontology deﬁnition, video analysis and meta-data extraction, information search, web crawling, as well as most recent ﬁndings from research in the performance of educational systems in second language acquisition. Despite the extensive research performed with respect to multimedia descriptors extraction, the user expectations in this ﬁeld have not been met yet, because of the semantic gap [5] remaining between what we can extract from a video content and what we can use for semantic search and retrieval. Among our most important objectives is to reach high-level information using low-level visual data from the video content. The detection and retrieval of the objects of interest that will be used to elaborate the quizzes will be done through the audio- visual analysis of the video content. Questions and correct answers are generated using the objects of interest detected, ranging from the simplest (keywords) to the more complex (faces), which also serve as query examples for searching in remote databases. While a wider range of questions can be created via text analysis, only ”Who”, ”Where” or ”What” questions can be generated using visual descriptors. For example, the feature extraction approach for face detection and recognition is used to identify a person performing an action or just present in the scene. Color descriptors and/or basic object recognition (household objects, logos, etc.) are used for general knowledge questions. In this paper, we will describe the face recognition methods we have used within our work. In the language learning context, TV series are the best material for testing the potential of visual descriptors extraction, as the characters and objects are recurring in the long run. The remainder of this paper is structured as follows: in Section 2 we point to related work. Section 3 presents our