Original Article
International Journal of Fuzzy Logic and Intelligent Systems
Vol. 20, No. 1, March 2020, pp. 17-25
http://doi.org/10.5391/IJFIS.2020.20.1.17
ISSN(Print) 1598-2645
ISSN(Online) 2093-744X
Feature Engineering Method Using
Double-Layer Hidden Markov Model for
Insider Threat Detection
Xiaoyun Ye ■ , Sung-Sam Hong ■ , and Myung-Mook Han ■
Department of Computer Science, Gachon University, Seongnam, Korea
Abstract
In the past, most Hidden Markov models based on time series only used the original HMM
model. The single-layer models (HMMs) structure has a big problem, and it isn’t straight-
forward to play its due role when it is necessary to make fine adjustments to the scene. So it
was impossible to entirely and flexibly perform user behavior. This paper performs feature
extraction and analysis of user behavior data of time series. The data labels should be added
after the parameters obtained by statistical methods for clustering to obtain the first hidden
state, and the layers are further layered according to working hours and outside working hours.
The experimental results show that the method has strong applicability and flexibility, and can
quickly detect abnormal behavior.
Keywords: Hidden Markov Model (HMM), User behavior, Insider threat, Feature
engineering, Anomaly detection.
Received: Dec. 31, 2019
Revised : Feb. 24, 2020
Accepted: Feb. 28, 2020
Correspondence to: Myung-Mook Han
(mmhan@gachon.ac.kr)
©The Korean Institute of Intelligent Systems
cc This is an Open Access article dis-
tributed under the terms of the Creative
Commons Attribution Non-Commercial Li-
cense (http://creativecommons.org/licenses/
by-nc/3.0/) which permits unrestricted non-
commercial use, distribution, and reproduc-
tion in any medium, provided the original
work is properly cited.
1. Introduction
The rapidly developing network technology has not only changed the human lifestyle and
brought great development opportunities, but also brought more significant challenges in terms
of security. After the Edward Snowden incident, people are paying more and more attention
to the security of privacy. Cyber-attacks have always existed, but in recent years, the losses
caused by internal security risks have become increasingly dangerous. According to the IT
Security Risks Survey conducted by Kaspersky Lab and B2B International, 73% of companies
have been affected by both intentional and unintentional internal information security incidents.
Out of those, a fifth (21%) of companies also lost valuable data that subsequently affected
their business [1]. If we do not stop the insider threats, that high-risk situation should be more
and more. But the insider threats can’t easily be considered as a data-driven problem. We
can’t use the methods just like an external attack because the complexity of internal attacks
is much higher than the outside’s attacks. Every employee’s daily work is different. The
challenge of insider threat detection is to set up the profile of each employee’s behavior. Insider
threat detection needs to focus on differentiating suspicious users from the other, but we can’t
directly specify an employee is an attacker. So, we need as many parameters as possible to
define the attack’s behavior. The hidden Markov model (HMM) is a statistical model used
for representing probability distributions over sequences of observations [2]. This model has
been used in several domains, such as speech recognition [3], text understanding [4], image
17 |