Event Detection and Clustering for Surveillance Video Summarization Uros Damnjanovic Queen Mary University of London uros.damnjanovic@elec.q mul.ac.uk Virginia Fernandez Universidad Autónoma de Madrid virginia.fernandeza@estu diante.uam.es Ebroul Izquierdo Queen Mary University of London ebroul.izquierdo@elec.qm ul.ac.uk José María Martinez Universidad Autónoma de Madrid josem.martinez@ uam.es Abstract The target of surveillance summarization is to identify high-value information events in a video stream and to present it to a user. In this paper we present surveillance summarization approach using detection and clustering of important events. Assuming that events are main source of energy change between consecutive frames set of interesting frames is extracted and then clustered. Based on the structure of clusters two types of summaries are created static and dynamic. Static summary is build of key frames that are organized in clusters. Dynamic summary is created from short video segments representing each cluster and is used to lead user to the event of interest captures in key frames. We describe our approach and present experimental results. 1. Introduction Nowadays, the interest in civil, military and commercial surveillance is growing up due to the increasing demand of security. Thousands of video cameras can be found at public places, public transport, banks, airports, etc. resulting in huge number of information which is difficult to process in real time. In order to efficiently organize growing stocks of surveillance videos it is necessary to automatically organize data using signal based representation. Video summarization techniques can be very useful tool when applied to the surveillance videos. Main objective of video summarization is to identify interesting segments in the video and present them to the user. Applied to the surveillance domain, summarization techniques can provide user both with overview of the events that occurred and faster browsing capabilities. By detecting and organizing events, essence of the surveillance video is captured in the summary decreasing time needed for browsing the content. Even though surveillance systems are in use for decades, number of publications related to surveillance domain has just been written in last few years. Detection and classification of events is used most often in the literature. Object detection technique based on wavelet coefficients is used to detect frontal and rear view of pedestrians in [1]. In [2] two different architectures that employ summarization techniques in the surveillance domain are described. Video summarization based on the optimization of viewing time, frame skipping and bit rate constraint is presented in [3]. For a given temporal rate constraint the optimal video summary problem is defined as finding a predefined number of frames that minimize the temporal distortion. In [4] authors presented the tool that utilizes MPEG-7 visual descriptors and generates a video index for summary creation. The resulting index generates a preview of the movie and allows non-linear access to the content. This approach is based on hierarchical clustering for merging shot segments that have similar features and neighbor each other in the time domain. In [5] Rasheed and Shah construct a shot similarity graph, and use graph partitioning normalized cut for clustering shots into scenes. Video motion analysis can be used for creating video summaries as in [6]. In this approach Wang et al. showed that by analyzing global/camera motion and object motion is possible to extract useful information about the video structure. More complete overview of existing techniques and available literature on intelligent surveillance systems can be found in [7] and [8]. We present in this paper event detection and clustering approach for building both static and dynamic summary. Main idea of our approach is to combine video skim with set of key frames organized in clusters to enable fast browsing of whole video. To create the summary we first detect events using energy difference between frames. Then we cluster events based on their visual appearance, and finally based on the clusters structure we build the summary and Ninth International Workshop on Image Analysis for Multimedia Interactive Services 978-0-7695-3130-4/08 $25.00 © 2008 IEEE DOI 10.1109/WIAMIS.2008.53 63