Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes Jie Feng 1 and Chao Zhang 1,* 1 Key Laboratory of Machine Perception Peking University Beijing, 100871, China *e-mail: c.zhang@pku.edu.cn Pengwei Hao 1,2 2 Department of Computer Science Queen Mary, University of London E1 4NS, UK e-mail: phao@dcs.qmul.ac.uk Abstract—Detecting abnormal behaviors in crowd scenes is quite important for public security and has been paid more and more attentions. Most previous methods use offline trained model to perform detection which can’t handle the constantly changing crowd environment. In this paper, we propose a novel unsupervised algorithm to detect abnormal behavior patterns in crowd scenes with online learning. The crowd behavior pattern is extracted from the local spatio-temporal volume which consists of multiple motion patterns in temporal order. An online self-organizing map (SOM) is used to model the large number of behavior patterns in crowd. Each neuron can be updated by incrementally learning the new observations. To demonstrate the effectiveness of our proposed method, we have performed experiments on real-world crowd scenes. The online learning can efficiently reduce the false alarms while still be able to detect most of the anomalies. Keywords-anomaly detection, SOM, online learning, crowd I. INTRODUCTION In recent years, automatically detecting abnormal behaviors in crowd scenes is more and more demanding for video surveillance in security-sensitive public areas such as airports, subway stations, etc. Traditional ways for behavior analysis [1, 2, 3] rely on segmenting and tracking of individuals in regular environment, but these methods always fail in crowd scenes because of large number of people and frequent occlusions. The computer vision community shows increasing interest in analyzing behaviors in crowd. Several methods have been proposed. Andrade et al. [5] combines PCA, spectral clustering, HMM to model crowd scenes and detect blocked exit event on simulation data. Mehran et al. [8] used social force model to capture interactions in crowd, then applied LDA to model normal behaviors and detect abnormal ones. Saxena et al. [9] proposed to select and track crowd feature points and developed end-user scenarios to detect anomaly in crowd instead of using a general crowd model. To detect abnormal behaviors, the basic idea is to model the normal behaviors and perform detection based on them. We refer to behavior patterns which appear frequently as normal behaviors and those appear rarely as abnormal ones. Most previous methods for crowd analysis only used offline trained models which do not change during detection. In real world crowd scenes, there are two major concerns while doing anomaly detection: first, normal behaviors often exhibit multimodal property such as people walking in different directions and are observed sequentially; on the other hand, the environment is always not stationary, even the normal patterns would change. If the model is trained once and used everywhere, it may not be able to fully capture different types of normal patterns and recognize normal patterns with big variation compared to training samples. Thus, online learning is a reasonable demand to adapt the model to the new incoming patterns and reduce the false positives. In this paper, we propose an algorithm to tackle the aforementioned issues. Crowd motion features are extracted from video to construct behavior patterns. An online self- organizing map (SOM) is applied to model the crowd scene. Each neuron of it is a behavior pattern which is updated according to the new observations. With time goes by, the model can learn new behavior patterns, which makes it more robust and reliable. The structure of this paper is as follows: in Section 2, the crowd behavior pattern representation is presented, and then we describe how to use SOM to learn from behavior patterns in an online manner in Section 3. In the final part, we give experiment results of the proposed algorithm. II. CROWD BEHAVIOR PATTERN A. Behavior Pattern Representation In a crowd scene with lots of people, it’s very difficult to segment individuals. Thus, we turn to analyze motion in local spatio-temporal volumes [13]. We first divide a scene into blocks of a fixed size. Low level motion information in each frame can be obtained using optical flow technique [10]. The obtained optical flow field is smoothed by median filtering. Flow vectors with too small or too large magnitude are removed to focus on valid motion instead of stationary points or noises. Each flow vector ܨ  =( ݔ  ,  ݕ  ) is assigned to a block according to its location. We further split the volume into non-overlapping clips along the time axis. Each clip is treated as a motion pattern M. Then, we fit a two dimensional Gaussian distribution (, ) using flow vectors in current clip, the parameters are estimated by maximum likelihood. Modeling a distribution keeps the 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.878 3587 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.878 3603 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.878 3599 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.878 3599 2010 International Conference on Pattern Recognition 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.878 3599