Original Article International Journal of Fuzzy Logic and Intelligent Systems Vol. 20, No. 1, March 2020, pp. 17-25 http://doi.org/10.5391/IJFIS.2020.20.1.17 ISSN(Print) 1598-2645 ISSN(Online) 2093-744X Feature Engineering Method Using Double-Layer Hidden Markov Model for Insider Threat Detection Xiaoyun Ye , Sung-Sam Hong , and Myung-Mook Han Department of Computer Science, Gachon University, Seongnam, Korea Abstract In the past, most Hidden Markov models based on time series only used the original HMM model. The single-layer models (HMMs) structure has a big problem, and it isn’t straight- forward to play its due role when it is necessary to make fine adjustments to the scene. So it was impossible to entirely and flexibly perform user behavior. This paper performs feature extraction and analysis of user behavior data of time series. The data labels should be added after the parameters obtained by statistical methods for clustering to obtain the first hidden state, and the layers are further layered according to working hours and outside working hours. The experimental results show that the method has strong applicability and flexibility, and can quickly detect abnormal behavior. Keywords: Hidden Markov Model (HMM), User behavior, Insider threat, Feature engineering, Anomaly detection. Received: Dec. 31, 2019 Revised : Feb. 24, 2020 Accepted: Feb. 28, 2020 Correspondence to: Myung-Mook Han (mmhan@gachon.ac.kr) ©The Korean Institute of Intelligent Systems cc This is an Open Access article dis- tributed under the terms of the Creative Commons Attribution Non-Commercial Li- cense (http://creativecommons.org/licenses/ by-nc/3.0/) which permits unrestricted non- commercial use, distribution, and reproduc- tion in any medium, provided the original work is properly cited. 1. Introduction The rapidly developing network technology has not only changed the human lifestyle and brought great development opportunities, but also brought more significant challenges in terms of security. After the Edward Snowden incident, people are paying more and more attention to the security of privacy. Cyber-attacks have always existed, but in recent years, the losses caused by internal security risks have become increasingly dangerous. According to the IT Security Risks Survey conducted by Kaspersky Lab and B2B International, 73% of companies have been affected by both intentional and unintentional internal information security incidents. Out of those, a fifth (21%) of companies also lost valuable data that subsequently affected their business [1]. If we do not stop the insider threats, that high-risk situation should be more and more. But the insider threats can’t easily be considered as a data-driven problem. We can’t use the methods just like an external attack because the complexity of internal attacks is much higher than the outside’s attacks. Every employee’s daily work is different. The challenge of insider threat detection is to set up the profile of each employee’s behavior. Insider threat detection needs to focus on differentiating suspicious users from the other, but we can’t directly specify an employee is an attacker. So, we need as many parameters as possible to define the attack’s behavior. The hidden Markov model (HMM) is a statistical model used for representing probability distributions over sequences of observations [2]. This model has been used in several domains, such as speech recognition [3], text understanding [4], image 17 |