Violence Detection in Movies Liang-Hua Chen, Hsi-Wen Hsu Chih-Wen Su and Li-Yun Wang Department of Computer Science Departmnet of Information and and Information Engineering Computer Engineering Fu Jen University, Taipei, Taiwan Chung Yuan University, Chung-Li, Taiwan Abstract As violence in movies has harmful inﬂuence on chil- dren, in this paper, we propose an algorithm to detect vio- lent scene in movies. Under our deﬁnition of violence, the task of violent scene detection is decomposed into action scene detection and bloody frame detection. While pre- vious approaches addressed on shot level of video struc- ture only, our approach works on more semantic-complete scene structure of video. The input video (digital movie) is ﬁrst segmented into several scenes. Based on the ﬁlm- making characteristics of action scene, some features of the scene are extracted to feed into the support vector ma- chine for classiﬁcation. Finally, the face, blood and motion information are integrated to determine whether the action scene has violent content. Experimental results show that the proposed approach works reasonably well in detecting most of the violent scenes. Compared with related work, our approach is computationally simple yet effective. 1 Introduction The advances in low cost mass storage devices, higher transmission rates and improved compression techniques, have led to the widespread use and availability of digital video. Nowadays, everyone can download movies easily using home computer. However, violence in movies has harmful inﬂuence on children. It was reported that children who liked to watch violent TV programs when they were 8 years old were more likely to behave aggressively at age 18[1]. To prevent children from watching violent movies, the automatic detection of inappropriate violence in movies is of substantial importance. For content provider, the vi- olence detection technique can be used to assist in movie- rating; for end user, it can block the violent content in client terminal devices. On the other hand, violent scenes attract attention and make viewers curious. They are usually the highlights of a movie. Therefore, violence detection would also be useful for movie skimming. In this paper, we propose an empirically motivated ap- proach for violence detection in movies. The task of vio- lent scene detection is decomposed into action scene detec- tion and bloody frame detection. Our approach is based on the integration of visual characteristics and temporal dy- namics information of video. The rest of this paper is or- ganized as follows. In the next section, we review some re- lated works and give the motivation for our approach. An action scene detection algorithm is presented in Section 3. In Section 4, we describe how to integrate several visual features to detect violent content. The performance eval- uation of our approach is reported in Section 5. Finally, some concluding remarks are given in Section 6. 2 Background and Motivation Relatively few approaches have been proposed to the problem of violent scene detection in video. The main rea- son is that the deﬁnition of violence is ambiguous. It is dif- ﬁcult to describe this high-level concept using mathemati- cal formulation precisely. Each related work addressed the problem by its own deﬁnition of violence. Depending on the type of video features, current techniques for violence detection can be broadly classiﬁed into three categories. The ﬁrst one is based on visual cue. Using motion trajec- tory information and orientation information of a person’s limbs, Datta et al. addressed the problem of detecting hu- man violence in video such as ﬁst ﬁghting and kicking[2]. Their approach relies on the extraction of silhouette of each person from the image. Thus it works well only in pres- ence of two persons. Mecocci and Micheli proposed to use maximum warping energy as criterion to detect violent acts among more people in crowded environments[3]. But, it is still difﬁcult to differentiate ﬁghting from basketball playing using this approach. It is also noted that both ap- proaches ([2, 3]) use video data from surveillance cameras and are not suitable for movies which have large camera movement. The second category is the audio based ap- proach. Giannakopoulos et al. used eight audio features, both from the time and frequency domain, as input to a bi- nary classiﬁer which decides the video content with respect to violence[4]. Then, they extended thir work to multi- 2011 Eighth International Conference Computer Graphics, Imaging and Visualization 978-0-7695-4484-7/11 $26.00 © 2011 IEEE DOI 10.1109/CGIV.2011.14 119 Authorized licensed use limited to: PORTLAND STATE UNIVERSITY LIBRARY. Downloaded on January 30,2024 at 18:50:12 UTC from IEEE Xplore. Restrictions apply.