SDVAR Algorithm for Detecting Fraud in Telecommunications Fatimah Almah Saaid, Darﬁana Nur and Robert King, University of Newcastle, Australia Abstract—This paper presents a procedure for estimating VAR using Sequential Discounting VAR (SDVAR) algorithm for online model learning to detect fraudulent acts using the telecommunications call detailed records (CDR). The volatility of the VAR is observed allowing for non-linearity, outliers and change points based on the works of [1]. This paper extends their procedure from univariate to multivariate time series. A simulation and a case study for detecting telecommunications fraud using CDR illustrate the use of the algorithm in the bivariate setting. Keywords—Telecommunications Fraud, SDVAR Algorithm, Mul- tivariate time series, Vector Autoregressive, Change points. I. I NTRODUCTION T ELECOMMUNICATION companies are facing fraudu- lent acts from time to time which greatly affecting the industries’ revenue. Although the exact loss ﬁgures due to fraud may not be made known to the public but the problems has become global to most telecommunications companies. One of the many ways to detect voice fraud is via the call detailed records (CDR). CDR is a massive amounts of call histories generated in real-time basis where it is among the largest real-time [2]. Part of a larger research project to detect fraudulent acts using the telecommunications CDR is to locate the change points which could lead to detecting suspicious (fraudulent) calls. The aim of this paper is to detect change points from the CDRs (as indicative of fraudulent acts) by incorporating uniﬁed detection scheme introduced by [1] where the learning model algorithm is extended to a multivariate time series. The algorithm, called Sequential Discounting for Vector Autoregressive (SDVAR), is proposed to detect fraud as soon as it occurs. The remainder of this paper is organized as follows. The following section reviews some previous works in change points detection. Section III provides a basic concept of vector autoregressive models. Section IV discusses the method designed for the study. The discussion on the results from the simulation and case studies are presented in Section V. The last section gives some discussions and conclusions. II. CHANGE POINTS Change points detection has been used in diverse ﬁelds. [3] proposed a geometric method for estimating linear state- F.A. Saaid is with the School of Mathematical and Physical Sciences, University of Newcastle, Callaghan 2308, NSW, Australia, e-mail: (Fa- timah.Saaid@uon.edu.au). D. Nur is with the School of Mathematical and Physical Sciences, University of Newcastle, Callaghan 2308, NSW, Australia, e-mail (Darﬁ- ana.Nur@newcastle.edu.au) R. King is with the School of Mathematical and Physical Sci- ences, University of Newcastle, Callaghan 2308, NSW, Australia, e-mail (Robert.King@newcastle.edu.au) space models for identifying change points in time-series data. Whilst Bayesian change points is applied by [4] to detect regions of genetic alteration in cancer research. It has also been used in detecting change points of the number of annual tropical cyclone [5]. In signal processing, change point based on singular spectrum analysis was applied by [6]. In recent study, [7] introduced the combination of wavelet denoising and sequential approach to detect change points on mobile phone based on the CDR. Network faulty monitoring is studied by [1]. They introduced a two-learning stage to detect outliers and change points in a unifying framework, ChangeFinder. The scheme is applied by employing autoregressive process where the model is learned using Sequential Discounting for Autoregressive (SDAR) algorithm, also being used by [8]. Adaptive to non-stationary time series is the key advantage of the algorithm. In this paper we study the call behaviour from the CDR by developing growth proﬁles for unique subscribers. The proﬁles are considered as the referenced proﬁles for normal callers where deviation (change) from these normal behaviours would lead to the identiﬁcation of suspicious call (act of fraud). To the authors’ best knowledge, the detection scheme proposed by [1] has not been studied in the context of multivariate time series. Due to the needs of using more than one variate to describe the dynamic behaviour of a time series especially when the aim is for detecting change points from the CDR, such issue is warranted. III. VECTOR AUTOREGRESSIVE (VAR) MODELS Autoregressive (AR) model is the most typical time series model to predict the current value from the past values in a same univariate time series. The number of the past values (or lag values) is referred to the order of the model. However, with the increase interest in modelling a series with more than one variable, multivariate time series model is required. In a VAR, or also known as multivariate AR (MAR) model, the value of each variable at each time point is predicted from the values of the same series and those of all other time series, depending on the variables used in the model. Consider the VAR model with p th order where N be the length of m series. Let x t =[x 1,t , ..., x m,t ] T denote (mx1) vectors of time series variables. Then VAR(p) model is given by: x i,t = μ + Φ 1 x t−1 + ... + Φ p x t−p + ε i,t , (1) where t =1, ..., N , i =1, ..., m, Φ k , k =1, ..., p is (mxm) coefﬁcient matrices, μ is (mx1) vector and ε i,t is the (mx1) vectors of i.i.d Gaussian noise with mean 0 and covariance matrix Σ ε . The mean for the i-series is given as E[x i,t ]= μ i . World Academy of Science, Engineering and Technology International Journal of Mathematical and Computational Sciences Vol:6, No:5, 2012 549 International Scholarly and Scientific Research & Innovation 6(5) 2012 scholar.waset.org/1307-6892/8767 International Science Index, Mathematical and Computational Sciences Vol:6, No:5, 2012 waset.org/Publication/8767