Event Detection and Clustering for Surveillance Video Summarization
Uros Damnjanovic
Queen Mary University of
London
uros.damnjanovic@elec.q
mul.ac.uk
Virginia Fernandez
Universidad Autónoma de
Madrid
virginia.fernandeza@estu
diante.uam.es
Ebroul Izquierdo
Queen Mary University of
London
ebroul.izquierdo@elec.qm
ul.ac.uk
José María Martinez
Universidad Autónoma de
Madrid
josem.martinez@ uam.es
Abstract
The target of surveillance summarization is to
identify high-value information events in a video
stream and to present it to a user. In this paper we
present surveillance summarization approach using
detection and clustering of important events. Assuming
that events are main source of energy change between
consecutive frames set of interesting frames is
extracted and then clustered. Based on the structure of
clusters two types of summaries are created static and
dynamic. Static summary is build of key frames that
are organized in clusters. Dynamic summary is created
from short video segments representing each cluster
and is used to lead user to the event of interest
captures in key frames. We describe our approach and
present experimental results.
1. Introduction
Nowadays, the interest in civil, military and
commercial surveillance is growing up due to the
increasing demand of security. Thousands of video
cameras can be found at public places, public transport,
banks, airports, etc. resulting in huge number of
information which is difficult to process in real time. In
order to efficiently organize growing stocks of
surveillance videos it is necessary to automatically
organize data using signal based representation. Video
summarization techniques can be very useful tool when
applied to the surveillance videos. Main objective of
video summarization is to identify interesting segments
in the video and present them to the user. Applied to
the surveillance domain, summarization techniques can
provide user both with overview of the events that
occurred and faster browsing capabilities. By detecting
and organizing events, essence of the surveillance
video is captured in the summary decreasing time
needed for browsing the content.
Even though surveillance systems are in use for
decades, number of publications related to surveillance
domain has just been written in last few years.
Detection and classification of events is used most
often in the literature. Object detection technique based
on wavelet coefficients is used to detect frontal and
rear view of pedestrians in [1]. In [2] two different
architectures that employ summarization techniques in
the surveillance domain are described. Video
summarization based on the optimization of viewing
time, frame skipping and bit rate constraint is presented
in [3]. For a given temporal rate constraint the optimal
video summary problem is defined as finding a
predefined number of frames that minimize the
temporal distortion. In [4] authors presented the tool
that utilizes MPEG-7 visual descriptors and generates a
video index for summary creation. The resulting index
generates a preview of the movie and allows non-linear
access to the content. This approach is based on
hierarchical clustering for merging shot segments that
have similar features and neighbor each other in the
time domain. In [5] Rasheed and Shah construct a shot
similarity graph, and use graph partitioning normalized
cut for clustering shots into scenes. Video motion
analysis can be used for creating video summaries as in
[6]. In this approach Wang et al. showed that by
analyzing global/camera motion and object motion is
possible to extract useful information about the video
structure. More complete overview of existing
techniques and available literature on intelligent
surveillance systems can be found in [7] and [8].
We present in this paper event detection and
clustering approach for building both static and
dynamic summary. Main idea of our approach is to
combine video skim with set of key frames organized
in clusters to enable fast browsing of whole video. To
create the summary we first detect events using energy
difference between frames. Then we cluster events
based on their visual appearance, and finally based on
the clusters structure we build the summary and
Ninth International Workshop on Image Analysis for Multimedia Interactive Services
978-0-7695-3130-4/08 $25.00 © 2008 IEEE
DOI 10.1109/WIAMIS.2008.53
63