Probabilistic Graphical Model on Detecting Insiders: Modeling with SGD-HMM Ahmed Saaudi, Yan Tong and Csilla Farkas Department of Computer Science & Engineering, University of South Carolina, 550 Assembly St., Columbia, U.S.A. Keywords: Insider Threat, Anomaly Detection System, Machine Learning, HMM, Big Data. Abstract: This paper presents a novel approach to detect malicious behaviors in computer systems. We propose the use of varying granularity levels to represent users’ log data: Session-based, Day-based, and Week-based. A user’s normal behavior is modeled using a Hidden Markov Model. The model is used to detect any deviation from the normal behavior. We also propose a Sliding Window Technique to identify malicious activity effectively by considering the near history of user activity. We evaluated our results using Receiver Operating Characteristic curves (or ROC curves). Our evaluation shows that the results are superior to existing research by improving the detection ability and reducing the false positive rate. Combining sliding window technique with session- based system gives a fast detection performance. 1 INTRODUCTION Insiders’ misuse of computer systems is a major concern for many organizations. Breach Level In- dex (Gemalto, 2016), public information of data breaches collected and distributed by Gemalto, asserts that around 40% of data leakage attacks are due to in- siders’ misuse. The data leakages are scored accord- ing to their importance. The risk scores of malicious insider threats are the highest in USA and China: 9.4 and 9.1 respectively. Additionally, the recent studies in (Gavai et al., 2015; House, 2012; Cappelli, 2012; Institute, 2017) show that the insider threat rate has increased compared to 2015. The mean time to detect such malicious data breaches is 50 days (Clearswift, 2018; Cappelli, 2012; Institute, 2017). There are several solutions proposed to deal with insider threat. Most of them define the suspicious be- haviors as low-frequency actions that are performed by a user. So, the unusual behaviors can be com- pared to high-frequency behaviors to predict the ab- normality. The activities can be captured by tracing log data within a specific time unit. The actions’ log data can be pre-processed such that it can be mod- eled using machine learning techniques (Rashid et al., 2016). However, none of these researches address the fact that a long time period is needed to detect mali- cious behaviors. In this paper, the raw data from five different do- mains, “Log on/ Log off,” “Connect/ Disconnect,” “Http,” “Emails,” and “Files,” are pre-processed to generate new sequence data samples. Multiple do- mains show different aspects of user behaviors which would support our model to detect malicious behav- ior. The new data samples are generated according to the detection time unit: Session-based sequences, Day-based sequences, Week-based sequences. In this paper, we present our results of the session- based analysis. We propose an unsupervised detection approach to monitor user actions and detect the abnormal be- haviors. A user’s behavior is represented as a series of activities performed within the organizational en- vironment. To identify the unusual sequence of ac- tions, a stochastic gradient descent version of HMM, “HMM-SGD”, is proposed to model the sequence of user activities. The new model has training flexibil- ity because it contains four hyper-parameters. These hyper-parameters can be tuned to improve model con- vergence. Our contribution in the presented work can be summarized as: 1. Processing the raw log data to be in session-based, day-based, and week-based sequences. Level granularity data samples help to discover the ab- normal behaviours that are distributed over time. 2. Proposing a sliding window technique to consider the effect of the recent history of user activities on their current behavior. Saaudi, A., Tong, Y. and Farkas, C. Probabilistic Graphical Model on Detecting Insiders: Modeling with SGD-HMM. DOI: 10.5220/0007404004610470 In Proceedings of the 5th International Conference on Information Systems Security and Privacy (ICISSP 2019), pages 461-470 ISBN: 978-989-758-359-9 Copyright c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved 461