Violence Detection in Movies
Liang-Hua Chen, Hsi-Wen Hsu Chih-Wen Su
and Li-Yun Wang
Department of Computer Science Departmnet of Information and
and Information Engineering Computer Engineering
Fu Jen University, Taipei, Taiwan Chung Yuan University, Chung-Li, Taiwan
Abstract
As violence in movies has harmful influence on chil-
dren, in this paper, we propose an algorithm to detect vio-
lent scene in movies. Under our definition of violence, the
task of violent scene detection is decomposed into action
scene detection and bloody frame detection. While pre-
vious approaches addressed on shot level of video struc-
ture only, our approach works on more semantic-complete
scene structure of video. The input video (digital movie)
is first segmented into several scenes. Based on the film-
making characteristics of action scene, some features of
the scene are extracted to feed into the support vector ma-
chine for classification. Finally, the face, blood and motion
information are integrated to determine whether the action
scene has violent content. Experimental results show that
the proposed approach works reasonably well in detecting
most of the violent scenes. Compared with related work,
our approach is computationally simple yet effective.
1 Introduction
The advances in low cost mass storage devices, higher
transmission rates and improved compression techniques,
have led to the widespread use and availability of digital
video. Nowadays, everyone can download movies easily
using home computer. However, violence in movies has
harmful influence on children. It was reported that children
who liked to watch violent TV programs when they were
8 years old were more likely to behave aggressively at age
18[1]. To prevent children from watching violent movies,
the automatic detection of inappropriate violence in movies
is of substantial importance. For content provider, the vi-
olence detection technique can be used to assist in movie-
rating; for end user, it can block the violent content in client
terminal devices. On the other hand, violent scenes attract
attention and make viewers curious. They are usually the
highlights of a movie. Therefore, violence detection would
also be useful for movie skimming.
In this paper, we propose an empirically motivated ap-
proach for violence detection in movies. The task of vio-
lent scene detection is decomposed into action scene detec-
tion and bloody frame detection. Our approach is based on
the integration of visual characteristics and temporal dy-
namics information of video. The rest of this paper is or-
ganized as follows. In the next section, we review some re-
lated works and give the motivation for our approach. An
action scene detection algorithm is presented in Section 3.
In Section 4, we describe how to integrate several visual
features to detect violent content. The performance eval-
uation of our approach is reported in Section 5. Finally,
some concluding remarks are given in Section 6.
2 Background and Motivation
Relatively few approaches have been proposed to the
problem of violent scene detection in video. The main rea-
son is that the definition of violence is ambiguous. It is dif-
ficult to describe this high-level concept using mathemati-
cal formulation precisely. Each related work addressed the
problem by its own definition of violence. Depending on
the type of video features, current techniques for violence
detection can be broadly classified into three categories.
The first one is based on visual cue. Using motion trajec-
tory information and orientation information of a person’s
limbs, Datta et al. addressed the problem of detecting hu-
man violence in video such as fist fighting and kicking[2].
Their approach relies on the extraction of silhouette of each
person from the image. Thus it works well only in pres-
ence of two persons. Mecocci and Micheli proposed to
use maximum warping energy as criterion to detect violent
acts among more people in crowded environments[3]. But,
it is still difficult to differentiate fighting from basketball
playing using this approach. It is also noted that both ap-
proaches ([2, 3]) use video data from surveillance cameras
and are not suitable for movies which have large camera
movement. The second category is the audio based ap-
proach. Giannakopoulos et al. used eight audio features,
both from the time and frequency domain, as input to a bi-
nary classifier which decides the video content with respect
to violence[4]. Then, they extended thir work to multi-
2011 Eighth International Conference Computer Graphics, Imaging and Visualization
978-0-7695-4484-7/11 $26.00 © 2011 IEEE
DOI 10.1109/CGIV.2011.14
119
Authorized licensed use limited to: PORTLAND STATE UNIVERSITY LIBRARY. Downloaded on January 30,2024 at 18:50:12 UTC from IEEE Xplore. Restrictions apply.