Graph modeling based video event detection
Najib Ben Aoun
1
, Haytham Elghazel
2
and Chokri Ben Amar
1
1
REGIM: REsearch Group on Intelligent Machines
University of Sfax, National School of Engineers (ENIS),
BP 1173, 3038, Sfax, Tunisia
{Najib.benaoun, Chokri.benamar}@ieee.org
2
GAMA laboratory,
University of Lyon, University of Lyon 1,
69622, Villeurbanne, France
Haytham.elghazel@univ-lyon1.fr
Abstract—Video processing and analysis have been an interesting
field in research and industry. Information detection or retrieval
were a challenged task especially with the spread of multimedia
applications and the increased number of the video acquisition
devices such as the surveillance cameras, phones cameras. These
have produced a large amount of video data which are also
diversified and complex. This is what makes event detection in
video a difficult task. Many video event detection methods were
developed which are composed of two fundamental parts: video
indexing and video classification. In this paper, we will introduce
a new video event detection system based on graphs. Our system
models the video frame as a graph in addition to a motion
description. Thereafter, these models were classified and events
are detected. Experimental results proved the effectiveness and
the robustness of our system.
Keywords-component; video event detection, video indexing,
graph modeling, Region Adajency Graph.
I. INTRODUCTION
Today a big number of video cameras were setting up all
over the world (in stations, airports, roads, etc). These video
cameras were used for security, arrangement, archiving, and
organization reasons. The quantity of videos acquired from
these videos cameras is very large which make its processing a
very hard task especially with the variety (persons, cars, etc)
and the complexity (fuzziness, noise, lightness, crowded and
dynamic environments, etc) of the gathered videos.
Video event detection (VED) is a challenged task since it
aims to detect some special events or activities used to trig
alarms (detection) as well as to reduce the volume of data
presented to human operator (retrieval). VED is a fundamental
part in many video processing and analysis systems used for
many applications such as: video surveillance, video
monitoring, traffic control, action recognition, video
summarizing, and bio-surveillance [1, 3].
To detect an event in a video sequence, it is crucial to
characterize it efficiently in a way to better describe it and
differ it from other events. This is done with a video event
indexation with robust and strong features specifying the
spatio-temporal proprieties of the video. Based on these video
event features and a good classification method, a powerful
VED system can be build.
For this, we have developed a VED based on an image
graph modeling as a spatial feature and a motion feature as a
temporal feature. These two feature are combined together to
form the video event feature. After that, the Support Vector
Machines (SVM) method is used for video event classification.
In this way, we have constructed a strong VED system which
has proved its efficacy and performance.
In this paper, we introduce the video event detection task
and present a state of art of some VED systems in Sec. II.
Then, in Sec. III, we present our video event indexing based on
the graph modeling. Sec. IV describes our proposed VED
system. Some experimental results are given in Sec. V
evaluating our system. In this section, we prove the robustness
and the efficiency of our proposed VED system. Finally,
Sec.VI summarizes the main results reached and proposes
some futures extension and improvement to our system.
II. VIDEO EVENT DETECTION
The interest on the video event detection task was increased
motivating by the augmentation of the video data generated
from a millions of video cameras daily from all over the world
and the range of potential applications which need VED phase.
The necessity to the semantic understanding of the visual
content of video and to automate the event detection system in
many applications has encouraged many researchers to work
on it.
The objective of the VED task is to temporally localize a
pre-defined event in a given video. Video event detection
process is generally conducted with two phases: video event
indexation by features extraction and video event classification.
Preprocessing techniques can be added to make event detection
more precise (background extraction, video segmentation, etc).
A number of video event detection systems have been
developed following, in most cases, this procedure [3, 4].
In some early approaches, [1] have proposed to extract
event features by combining the shape and the motion
proprieties of the video objects and classifying it by the Hidden
Markov Models (HMM). In [2] a VED system is implemented
based on a combination of some conventional high-level spatial
features and optical flow based motion feature to form a high-
level video event system classified later by a multi-SVM
classifier. Similar approach was followed in [3] based on local
spatio-temporal feature modeling and motion feature for
persons run event detection. Recently, Z.F Huang and G. Mori
[4] have produced a VED system based on moving region
detection (using background subtraction, optical flow, and
photogrammetric context) and a human detection as a
preprocessing stage. After this, motion feature are then
extracted and classified by AdaBoost to detect events.
2011 International Conference on Innovations in Information Technology
978-1-4577-0314-0/11/$26.00 ©2011 IEEE 114