UNDERSTANDING DYNAMIC SCENES BY HIERARCHICAL MOTION PATTERN MINING Lei Song 1 , Fan Jiang 2 , Zhongke Shi 1 , Aggelos K. Katsaggelos 2 1 School of Automation, Northwestern Polytechnical University, Xi’an, 710072, China 2 Dept of EECS, Northwestern University, Evanston, 60208, USA songlei@mail.nwpu.edu.cn, {ffji295, aggk}@eecs.northwestern.edu, zkeshi@nwpu.edu.cn ABSTRACT Our work addresses the problem of analyzing and understanding dynamic video scenes. A two-level motion pattern mining approach is proposed. At the first level, single-agent motion patterns are modeled as distributions over pixel-based features. At the second level, interaction patterns are modeled as distributions over single-agent motion patterns. Both patterns are shared among video clips. Compared to other works, the advantage of our method is that interaction patterns are detected and assigned to every video frame. This enables a finer semantic interpretation and more precise anomaly detection. Specifically, every video frame is labeled by a certain interaction pattern and moving pixels in each frame which do not belong to any single- agent pattern or cannot exist in the corresponding interaction pattern are detected as anomalies. We have tested our approach on a challenging traffic surveillance sequence containing both pedestrian and vehicular motions and obtained promising results. Index Terms—Visual surveillance, LDA, motion pattern analysis, anomaly detection 1. INTRODUCTION In many surveillance scenarios, such as those involving a crowded traffic scene, a busy train station, or a shopping mall, various motions are involved. It is highly desirable to analyze the motion patterns and obtain some high-level interpretation of the semantic content. For example, in a video monitoring traffic intersection, without any prior knowledge about the traffic rules in the specific scene, it is useful to discover typical vehicle behaviors and their dependencies involved in this scene, and detect anomalous motion for security concerns. Motion patterns involved in a complex dynamic scene usually have a hierarchical nature. Typically, many objects (e.g., vehicles) are involved in the video scene. In terms of __________________________ The work of A. K. Katsaggelos was supported in part by a grant from the US Department of Energy (DE-NA0000431) each single object, its motion might follow some regular streams, which are single-agent motion patterns. In addition, the co-occurrence of multiple objects at a same time might also be subject to constraints, which define interaction patterns. For example in the traffic intersection scenario, the single-agent motion patterns are all the legal paths going through this intersection (shown in Fig.1 (a) and numbered from 1 to 7), while the interaction patterns are possible combinations of paths determined by the traffic lights (shown in Fig.1 (b) as combinations 1 and 2). (a) (b) Fig. 1. Single-agent motion patterns and interaction patterns Considering this hierarchical nature of motion patterns, many works on scene understanding and motion pattern discovery are based on hierarchical modeling. One common approach is based on object trajectory analysis. Typically, objects are tracked in video and an analysis and mining approach is applied to the object trajectories to discover motion patterns. For example, Jiang et al. [2] use an HMM to characterize object trajectories, and a BIC-based dissimilarity measure is used for highly recurrent events clustering. Duong et al. [3] introduce the Switching Hidden Semi-Markov Model for atomic activity modeling, and the high-level activities are modeled as a sequence of atomic activities. Jiang et al. [4] characterize the crowded motion by a patch-based local motion representation, and cluster all patches into different motion patterns by spectral clustering. Basharat et al. [5] detect abnormal events based on local and global behavior of tracks. Instead of clustering tracks into major paths, they build local pixel level probability density functions that capture a variety of tracks. However, object tracking methods are sensitive to object detection, recognition and tracking errors, and they usually fail in complicated or crowded scenes due primarily 978-1-61284-350-6/11/$26.00 ©2011 IEEE