One-Shot Learning for Surveillance Anomaly
Recognition using Siamese 3D CNN
Amin Ullah
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
aminullah@ieee.org
Khan Muhammad
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
khan.muhammad@ieee.org
Killichbek Haydarov
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
kilichbek.haydarov@gmail.com
Ijaz Ul Haq
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
ijazulhaq@ieee.org
Miyoung Lee
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
miylee@gmail.com
Sung Wook Baik*
Intelligent Media Laboratory,
Digital Contents Research
Institute, Sejong University
Seoul, South Korea
sbaik@sejong.ac.kr
Abstract—One-shot image recognition has been explored
for many applications in computer vision community. However,
its applications in video analytics is not deeply investigated yet.
For instance, surveillance anomaly recognition is an open
challenging problem and one of its hurdles is the lack of accurate
temporally annotated data. This paper addresses the lack of
data issue using one-shot learning strategy and proposes an
anomaly recognition framework which exploits a 3D CNN
siamese network that yields the similarity between two anomaly
sequences. This paper also investigates the existing 3D CNNs for
this task and then proposes a lightweight 3D CNN model that
efficiently handles one-shot anomaly recognition. Once our
network is trained, then we can use the powerful discriminative
3D CNN features to predict anomalies not only for the new data
but also for entirely new classes. The proposed model is trained
using temporally annotated test set of UCF Crime dataset.
Finally, the trained model is used to recognize the anomalies and
produce temporal automatic labels for the video level weakly
annotated training set of the dataset.
Keywords—Artificial intelligence, deep learning,
convolutional neural network, anomaly recognition,
siamese network, one-shot learning.
I. INTRODUCTION
Surveillance cameras are one of the most reliable sources for
the investigation of crime/anomaly scenes. However,
advancements in computer vision and artificial intelligence
took it one step further by detecting and recognizing the
anomaly in real-time, helping in instantaneous reporting
systems [1]. Most of these methods with high performance
are based on various deep neural network architectures that
rely on massive amount of annotated video datasets for
training with powerful computational resources [2]. In
addition, these models require retraining when there is a need
for adding a new class in a classification task [3]. These facts
impose problems on training neural networks. In such
scenarios, one-shot learning can provide a potential solution
which discovers how to perform a classification task by only
looking at a single sample of each possible class even if the
data is scarce [4]. This kind of learning process under the
constraint removes the necessity of retraining models for new
classes and facilitates the learning process in dynamically
changing data environments [5].
Figure 1: The proposed framework for one-shot anomaly
recognition using 3D CNNs siamese network. The sliding
window shot is compared with different example anomaly
shots and the one outputs as same is considered as recognized
anomaly.
Current methods for one-shot learning are inclined
towards the meta-learning approaches. The basic idea of meta-
learning is to exploit knowledge obtained from prior learning
experience to learn more efficiently in future tasks [6, 7].
There exist several approaches in the literature that addressed
the one- and few-shot learning. For instance, one category of
such approaches is to treat deep neural networks as learners
for feature encodings and train a separate meta-learner which
learns how to update rules [8-10] or directly generate weights
for the inference model [11]. In this way, the meta-learner
directs the inference model to swiftly adjust its parameters to
each specific task. On the other hand, instead of learning the
updated parameters, the MAML [12] focused on finding the
optimal initial parameters that can achieve a good
generalization across similar tasks and make the task-specific
fine-tuning process more efficient. Similarly, some
approaches demonstrated that neural networks with
augmented memory capabilities can act as a meta-learner. For
instance, the method presented in [13] utilized Neural Turing
Machine as the base model and trained it in such a way that
the memory can encode and retrieve new information quickly.
978-1-7281-6926-2/20/$31.00 ©2020 IEEE