Online Learning with Self-Organizing Maps for Anomaly Detection
in Crowd Scenes
Jie Feng
1
and Chao Zhang
1,*
1
Key Laboratory of Machine Perception
Peking University
Beijing, 100871, China
*e-mail: c.zhang@pku.edu.cn
Pengwei Hao
1,2
2
Department of Computer Science
Queen Mary, University of London
E1 4NS, UK
e-mail: phao@dcs.qmul.ac.uk
Abstract—Detecting abnormal behaviors in crowd scenes is
quite important for public security and has been paid more
and more attentions. Most previous methods use offline trained
model to perform detection which can’t handle the constantly
changing crowd environment. In this paper, we propose a
novel unsupervised algorithm to detect abnormal behavior
patterns in crowd scenes with online learning. The crowd
behavior pattern is extracted from the local spatio-temporal
volume which consists of multiple motion patterns in temporal
order. An online self-organizing map (SOM) is used to model
the large number of behavior patterns in crowd. Each neuron
can be updated by incrementally learning the new observations.
To demonstrate the effectiveness of our proposed method, we
have performed experiments on real-world crowd scenes. The
online learning can efficiently reduce the false alarms while
still be able to detect most of the anomalies.
Keywords-anomaly detection, SOM, online learning, crowd
I. INTRODUCTION
In recent years, automatically detecting abnormal
behaviors in crowd scenes is more and more demanding for
video surveillance in security-sensitive public areas such as
airports, subway stations, etc. Traditional ways for behavior
analysis [1, 2, 3] rely on segmenting and tracking of
individuals in regular environment, but these methods
always fail in crowd scenes because of large number of
people and frequent occlusions. The computer vision
community shows increasing interest in analyzing behaviors
in crowd. Several methods have been proposed. Andrade et
al. [5] combines PCA, spectral clustering, HMM to model
crowd scenes and detect blocked exit event on simulation
data. Mehran et al. [8] used social force model to capture
interactions in crowd, then applied LDA to model normal
behaviors and detect abnormal ones. Saxena et al. [9]
proposed to select and track crowd feature points and
developed end-user scenarios to detect anomaly in crowd
instead of using a general crowd model.
To detect abnormal behaviors, the basic idea is to model
the normal behaviors and perform detection based on them.
We refer to behavior patterns which appear frequently as
normal behaviors and those appear rarely as abnormal ones.
Most previous methods for crowd analysis only used offline
trained models which do not change during detection. In real
world crowd scenes, there are two major concerns while
doing anomaly detection: first, normal behaviors often
exhibit multimodal property such as people walking in
different directions and are observed sequentially; on the
other hand, the environment is always not stationary, even
the normal patterns would change. If the model is trained
once and used everywhere, it may not be able to fully capture
different types of normal patterns and recognize normal
patterns with big variation compared to training samples.
Thus, online learning is a reasonable demand to adapt the
model to the new incoming patterns and reduce the false
positives.
In this paper, we propose an algorithm to tackle the
aforementioned issues. Crowd motion features are extracted
from video to construct behavior patterns. An online self-
organizing map (SOM) is applied to model the crowd scene.
Each neuron of it is a behavior pattern which is updated
according to the new observations. With time goes by, the
model can learn new behavior patterns, which makes it
more robust and reliable.
The structure of this paper is as follows: in Section 2, the
crowd behavior pattern representation is presented, and then
we describe how to use SOM to learn from behavior
patterns in an online manner in Section 3. In the final part,
we give experiment results of the proposed algorithm.
II. CROWD BEHAVIOR PATTERN
A. Behavior Pattern Representation
In a crowd scene with lots of people, it’s very difficult to
segment individuals. Thus, we turn to analyze motion in
local spatio-temporal volumes [13]. We first divide a scene
into blocks of a fixed size. Low level motion information in
each frame can be obtained using optical flow technique [10].
The obtained optical flow field is smoothed by median
filtering. Flow vectors with too small or too large magnitude
are removed to focus on valid motion instead of stationary
points or noises. Each flow vector ܨ
=(
ݔ
,
ݕ
) is assigned
to a block according to its location. We further split the
volume into non-overlapping clips along the time axis. Each
clip is treated as a motion pattern M. Then, we fit a two
dimensional Gaussian distribution (, ) using flow
vectors in current clip, the parameters are estimated by
maximum likelihood. Modeling a distribution keeps the
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.878
3587
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.878
3603
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.878
3599
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.878
3599
2010 International Conference on Pattern Recognition
1051-4651/10 $26.00 © 2010 IEEE
DOI 10.1109/ICPR.2010.878
3599