Digital Signal Processing 99 (2020) 102657 Contents lists available at ScienceDirect Digital Signal Processing www.elsevier.com/locate/dsp A novel distributed anomaly detection algorithm based on support vector machines Tolga Ergen a, , Suleyman S. Kozat b a Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA b Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey a r t i c l e i n f o a b s t r a c t Article history: Available online 8 January 2020 Keywords: Anomaly detection Distributed learning Support vector machine Gradient based training In this paper, we study anomaly detection in a distributed network of nodes and introduce a novel algorithm based on Support Vector Machines (SVMs). We first reformulate the conventional SVM optimization problem for a distributed network of nodes. We then directly train the parameters of this SVM architecture in its primal form using a gradient based algorithm in a fully distributed manner, i.e., each node in our network is allowed to communicate only with its neighboring nodes in order to train the parameters. Therefore, we not only obtain a high performing anomaly detection algorithm thanks to strong modeling capabilities of SVMs, but also achieve significantly reduced communication load and computational complexity due to our fully distributed and efficient gradient based training. Here, we provide a training algorithm in a supervised framework, however, we also provide the extensions of our implementation to an unsupervised framework. We illustrate the performance gains achieved by our algorithm via several benchmark real life and synthetic experiments. 2020 Elsevier Inc. All rights reserved. 1. Introduction Anomaly detection has been extensively studied in the current literature due to its various applications such as health care, fraud detection, network monitoring, cybersecurity, and military surveil- lance [15]. Particularly, in this paper, we study the anomaly detec- tion problem in a distributed framework, where we have a network of K nodes equipped with processing capabilities. Each node in our network observes a data sequence and aims to decide whether the observations are anomalous or not. Here, each node has a set of neighboring nodes and can share information with its neighbors in order to enhance the detection performance. Various other anomaly detection algorithms have also been in- troduced in order to enhance the detection performance. As an ex- ample, Fisher kernel and generative models are introduced to gain performance improvements, especially for time series data [69]. However, the main drawback of the Fisher kernel model is that it requires the inversion of Fisher information matrix, which has a high computational complexity [6,7]. On the other hand, in order to obtain an adequate performance from a generative model such as a Hidden Markov Model (HMM), one should carefully select its This paper is in part supported by Tubitak Project No: 117E153. * Corresponding author. E-mail addresses: ergen@stanford.edu (T. Ergen), kozat@ee.bilkent.edu.tr (S.S. Kozat). structural parameters, e.g., the number of states and topology of the model [8,9]. Furthermore, the type of training algorithm has also considerable effects on the performance of generative models, which limits their usage in real life applications [9]. Thus, neural networks, especially Recurrent Neural Networks (RNNs), based ap- proaches are introduced thanks to their inherent memory structure that can store “time” or “state” information and strong modeling capabilities [1,10]. Since the basic RNN architecture does not have control structures (gates) to regulate the amount of information to be stored [11,12], a more advanced RNN architecture with several control structures, i.e., the Long Short Term Memory (LSTM) net- work, are usually employed among RNNs [12,13]. However, such neural network based approaches do not have a proper objective criterion for anomaly detection tasks especially in the absence of data labels [1,14]. Hence, they first predict a sequence from its past samples and then determine whether the sequence is an anomaly or not based on the prediction error, i.e., an anomaly is an event, which cannot be predicted from the past nominal data [1]. Thus, they require a probabilistic model for the prediction error and a threshold on the probabilistic model to detect anomalies, which results in challenging optimization problems and restricts their performance accordingly [1,14,15]. Furthermore, one needs a con- siderable amount of computational power and time to adequately train such networks due to their highly complex and nonlinear structure, which also makes them susceptible to overfitting prob- lems. In order to handle large scale anomaly detection problems, https://doi.org/10.1016/j.dsp.2020.102657 1051-2004/2020 Elsevier Inc. All rights reserved.