Graph modeling based video event detection Najib Ben Aoun 1 , Haytham Elghazel 2 and Chokri Ben Amar 1 1 REGIM: REsearch Group on Intelligent Machines University of Sfax, National School of Engineers (ENIS), BP 1173, 3038, Sfax, Tunisia {Najib.benaoun, Chokri.benamar}@ieee.org 2 GAMA laboratory, University of Lyon, University of Lyon 1, 69622, Villeurbanne, France Haytham.elghazel@univ-lyon1.fr Abstract—Video processing and analysis have been an interesting field in research and industry. Information detection or retrieval were a challenged task especially with the spread of multimedia applications and the increased number of the video acquisition devices such as the surveillance cameras, phones cameras. These have produced a large amount of video data which are also diversified and complex. This is what makes event detection in video a difficult task. Many video event detection methods were developed which are composed of two fundamental parts: video indexing and video classification. In this paper, we will introduce a new video event detection system based on graphs. Our system models the video frame as a graph in addition to a motion description. Thereafter, these models were classified and events are detected. Experimental results proved the effectiveness and the robustness of our system. Keywords-component; video event detection, video indexing, graph modeling, Region Adajency Graph. I. INTRODUCTION Today a big number of video cameras were setting up all over the world (in stations, airports, roads, etc). These video cameras were used for security, arrangement, archiving, and organization reasons. The quantity of videos acquired from these videos cameras is very large which make its processing a very hard task especially with the variety (persons, cars, etc) and the complexity (fuzziness, noise, lightness, crowded and dynamic environments, etc) of the gathered videos. Video event detection (VED) is a challenged task since it aims to detect some special events or activities used to trig alarms (detection) as well as to reduce the volume of data presented to human operator (retrieval). VED is a fundamental part in many video processing and analysis systems used for many applications such as: video surveillance, video monitoring, traffic control, action recognition, video summarizing, and bio-surveillance [1, 3]. To detect an event in a video sequence, it is crucial to characterize it efficiently in a way to better describe it and differ it from other events. This is done with a video event indexation with robust and strong features specifying the spatio-temporal proprieties of the video. Based on these video event features and a good classification method, a powerful VED system can be build. For this, we have developed a VED based on an image graph modeling as a spatial feature and a motion feature as a temporal feature. These two feature are combined together to form the video event feature. After that, the Support Vector Machines (SVM) method is used for video event classification. In this way, we have constructed a strong VED system which has proved its efficacy and performance. In this paper, we introduce the video event detection task and present a state of art of some VED systems in Sec. II. Then, in Sec. III, we present our video event indexing based on the graph modeling. Sec. IV describes our proposed VED system. Some experimental results are given in Sec. V evaluating our system. In this section, we prove the robustness and the efficiency of our proposed VED system. Finally, Sec.VI summarizes the main results reached and proposes some futures extension and improvement to our system. II. VIDEO EVENT DETECTION The interest on the video event detection task was increased motivating by the augmentation of the video data generated from a millions of video cameras daily from all over the world and the range of potential applications which need VED phase. The necessity to the semantic understanding of the visual content of video and to automate the event detection system in many applications has encouraged many researchers to work on it. The objective of the VED task is to temporally localize a pre-defined event in a given video. Video event detection process is generally conducted with two phases: video event indexation by features extraction and video event classification. Preprocessing techniques can be added to make event detection more precise (background extraction, video segmentation, etc). A number of video event detection systems have been developed following, in most cases, this procedure [3, 4]. In some early approaches, [1] have proposed to extract event features by combining the shape and the motion proprieties of the video objects and classifying it by the Hidden Markov Models (HMM). In [2] a VED system is implemented based on a combination of some conventional high-level spatial features and optical flow based motion feature to form a high- level video event system classified later by a multi-SVM classifier. Similar approach was followed in [3] based on local spatio-temporal feature modeling and motion feature for persons run event detection. Recently, Z.F Huang and G. Mori [4] have produced a VED system based on moving region detection (using background subtraction, optical flow, and photogrammetric context) and a human detection as a preprocessing stage. After this, motion feature are then extracted and classified by AdaBoost to detect events. 2011 International Conference on Innovations in Information Technology 978-1-4577-0314-0/11/$26.00 ©2011 IEEE 114