S. Li et al. (Eds.): MMM 2013, Part I, LNCS 7732, pp. 368–379, 2013. © Springer-Verlag Berlin Heidelberg 2013 Flexible Presentation of Videos Based on Affective Content Analysis Sicheng Zhao, Hongxun Yao, Xiaoshuai Sun, Xiaolei Jiang, and Pengfei Xu School of Computer Science and Technology, Harbin Institute of Technology, No.92, West Dazhi Street, Harbin, P.R. China, 150001 {zsc,h.yao,xiaoshuaisun,xljiang,pfxu}@hit.edu.cn Abstract. The explosion of multimedia contents has resulted in a great demand of video presentation. While most previous works focused on presenting certain type of videos or summarizing videos by event detection, we propose a novel method to present general videos of different genres based on affective content analysis. We first extract rich audio-visual affective features and select discriminative ones. Then we map effective features into corresponding affective states in an improved categorical emotion space using hidden conditional random fields (HCRFs). Finally we draw affective curves which tell the types and intensities of emotions. With the curves and related affective visualization techniques, we select the most affective shots and concatenate them to construct affective video presentation with a flexible and changeable type and length. Experiments on representative video database from the web demonstrate the effectiveness of the proposed method. Keywords: Video presentation, affective analysis, emotion space, HCRFs. 1 Introduction The explosion of multimedia contents has resulted in a great demand of video presentation. On one hand, viewers need to get a gist of video content, watch video highlights due to time limit and then make the decision to view the entire video (e.g. a movie) or not. On the other hand, video broadcast platforms, especially television stations, have to check substantial videos and select legal and valuable ones to play, which is a time-consuming and tedious task. Thus, effective video presentation techniques can make video reviewers’ work more convenient and efficient. Most previous works on content-based video presentation focused on certain type of videos, such as sports videos, home videos, or summarizing videos by event detection [1-5]. Liu et al. [1] proposed a novel flexible racquet sports video content summarization framework, by combining the structure event detection method with the highlight ranking algorithm. Zhao et al. [2] proposed a novel system of highlight summarization in sports videos based on replay detection. Based on videos’ three properties: emotional tone, local main character and global main character, Xiang and Kankanhalli [3] employed affective analysis to automatically create adaptive presentations from home videos for three types of social groups: family, acquaintance