Network-Trafic Anomaly Detection with Incremental Majority Learning Shin-Ying Huang, Fang Yu, Rua-Huan Tsaih, Yennun Huang Abstract- Detecting anomaly behavior in large network traic data has presented a great challenge in designing effective intrusion detection systems. We propose an adaptive model to learn majority patterns under a dynamic changing environment. We irst propose unsupervised learning on data abstraction to extract essential features of samples. We then adopt incremental majority learning with iterative evolutions on itting envelopes to characterize the majority of samples within moving windows. A network traic sample is considered an anomaly if its abstract feature falls on the outside of the itting envelope. We justify the effectiveness of the presented approach against 150000+ traic samples from the NSL-KDD dataset in training and testing, demonstrating positive promise in detecting network attacks by identifying samples that have abnormal features. Keywords-intrusion detection system, outlier detection, neural network� incremental learning I. INTRODUCTION To detect potential threats or formalize new attack pattens, intrusion detection systems (IDS) are needed that provide automated detection of malicious traic behavior via techniques such as anomaly detection methods with incremental learning ability. In other words, the periodical analysis of network traic is essential to help develop a behavior-based IDS that is capable of detecting zero-day attacks. Network trafic not only has high volumes but also contains imbalanced data proportions, so it is diicult to estimate the boundaries between normal and abnormal S. Y. Hung is with the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan (email: smichelle 19@citi.sinica.edu.tw). F. Yu is with Department of Management Information Systems, National Chengchi University, Taipei, Taiwan (email: yuf@nccu.edu.tw) R. H. Tsaih is with Department of Mnagement Information Systems, National Chengchi University, Taipei, Taiwan (email: tsaih@mis.nccu.edu.tw) Y. Huang is with the Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan (email: yennunhuang@citi.sinica.edu.tw). We acknowledge the inancial support of the Ministry of Science and Technology, Taiwan No. I03-222I-E-OOI-028-MY3, No. I03-2221-E-004-006-MY3). 978-1-4799-1959-8/15/$31.00 @2015 IEEE network traic in a constantly changing environment [30]. Many methodologies were developed to facilitate network intrusion detection. However, many of the existing methods lack incremental leaning ability and have not utilized abstraction techniques to address high dimensional features. Therefore, this study presents an unsupervised leaning abstraction as a pre-process that keeps suicient information to formalize regular network trafic behavior. An incremental majority leaning approach is developed that is able to identiy anomalous network trafic in a dynamic changing environment. Detecting anomalous network traic can be regarded as recognizing the outliers rom the majority of observed network trafic. Chandola et al. [3] provided a comprehensive overview of the existing outlier detection techniques that are classiied along with different dimensions. They concluded that every unique problem formulation entails a different approach, resulting in a huge literature on outlier detection techniques. Thapngam et al. [25] proposed a detection method based on Pearson's correlation coeficient. The proposed methods can extract repeatable features rom the packet arrivals in the DDoS trafic but not in lash crowd traic. Choras et al. [4] proposed a ramework for network security based on the correlation approach as well as a new signal-based algorithm for intrusion detection. To identiy the anomalous packets rom high-dimensional input features, we propose a novel abstraction technique to keep less but suficient information rom monitored observations. Similar to the concept of representation learning [7] based on the abstraction of high-dimensional data, the presented approach beneits rom the abstraction that can appear in high-level attributes that are only sensitive to some very speciic types of changes in the input. Such deep architectures can lead to abstract representations whereby more abstract concepts can be constructed in terms of less abstract ones. The objective of this study is to develop an adaptive model that can be used to identiy anomalous network traic rom the majority of network traic data. Usually, the patten of normal network trafic should not be changed dramatically, and it would be helpul to observe the normal behavior and separate the anomalies in a monitored network scope within a time window. We aim to develop an incremental majority leaning model with a moving-window scheme that is capable