One-Shot Learning for Surveillance Anomaly Recognition using Siamese 3D CNN Amin Ullah Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea aminullah@ieee.org Khan Muhammad Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea khan.muhammad@ieee.org Killichbek Haydarov Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea kilichbek.haydarov@gmail.com Ijaz Ul Haq Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea ijazulhaq@ieee.org Miyoung Lee Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea miylee@gmail.com Sung Wook Baik* Intelligent Media Laboratory, Digital Contents Research Institute, Sejong University Seoul, South Korea sbaik@sejong.ac.kr Abstract—One-shot image recognition has been explored for many applications in computer vision community. However, its applications in video analytics is not deeply investigated yet. For instance, surveillance anomaly recognition is an open challenging problem and one of its hurdles is the lack of accurate temporally annotated data. This paper addresses the lack of data issue using one-shot learning strategy and proposes an anomaly recognition framework which exploits a 3D CNN siamese network that yields the similarity between two anomaly sequences. This paper also investigates the existing 3D CNNs for this task and then proposes a lightweight 3D CNN model that efficiently handles one-shot anomaly recognition. Once our network is trained, then we can use the powerful discriminative 3D CNN features to predict anomalies not only for the new data but also for entirely new classes. The proposed model is trained using temporally annotated test set of UCF Crime dataset. Finally, the trained model is used to recognize the anomalies and produce temporal automatic labels for the video level weakly annotated training set of the dataset. Keywords—Artificial intelligence, deep learning, convolutional neural network, anomaly recognition, siamese network, one-shot learning. I. INTRODUCTION Surveillance cameras are one of the most reliable sources for the investigation of crime/anomaly scenes. However, advancements in computer vision and artificial intelligence took it one step further by detecting and recognizing the anomaly in real-time, helping in instantaneous reporting systems [1]. Most of these methods with high performance are based on various deep neural network architectures that rely on massive amount of annotated video datasets for training with powerful computational resources [2]. In addition, these models require retraining when there is a need for adding a new class in a classification task [3]. These facts impose problems on training neural networks. In such scenarios, one-shot learning can provide a potential solution which discovers how to perform a classification task by only looking at a single sample of each possible class even if the data is scarce [4]. This kind of learning process under the constraint removes the necessity of retraining models for new classes and facilitates the learning process in dynamically changing data environments [5]. Figure 1: The proposed framework for one-shot anomaly recognition using 3D CNNs siamese network. The sliding window shot is compared with different example anomaly shots and the one outputs as same is considered as recognized anomaly. Current methods for one-shot learning are inclined towards the meta-learning approaches. The basic idea of meta- learning is to exploit knowledge obtained from prior learning experience to learn more efficiently in future tasks [6, 7]. There exist several approaches in the literature that addressed the one- and few-shot learning. For instance, one category of such approaches is to treat deep neural networks as learners for feature encodings and train a separate meta-learner which learns how to update rules [8-10] or directly generate weights for the inference model [11]. In this way, the meta-learner directs the inference model to swiftly adjust its parameters to each specific task. On the other hand, instead of learning the updated parameters, the MAML [12] focused on finding the optimal initial parameters that can achieve a good generalization across similar tasks and make the task-specific fine-tuning process more efficient. Similarly, some approaches demonstrated that neural networks with augmented memory capabilities can act as a meta-learner. For instance, the method presented in [13] utilized Neural Turing Machine as the base model and trained it in such a way that the memory can encode and retrieve new information quickly. 978-1-7281-6926-2/20/$31.00 ©2020 IEEE