(IJCSIS) International Journal of Computer Science and Information Security, Vol. 14, No. 6, June 2016 An optimized approach toward intrusion detection using cluster-like behavior of attacks Aliakbar Tajari Siahmarzkooh Faculty of Mathematical Sciences Department of Computer Science Iran, Tabriz, University of Tabriz Jaber. Karimpour Faculty of Mathematical Sciences Department of Computer Science Iran, Tabriz, University of Tabriz Shahriar Lotfi Faculty of Mathematical Sciences Department of Computer Science Iran, Tabriz, University of Tabriz Abstract— Most of intrusion detection researches suffer from the following drawbacks: Dependencies between network nodes and cluster-like behavior of anomalies. Hence, this paper proposes a cluster-based approach in which the anomalies are detected using a new criterion related to the behavior of attacks. In addition, we provide a cluster-based data set which uses the flow-based data and graph properties to model the network traffic over time. The data set is built over the DARPA. Moreover, the anomalies are revealed by means of a criterion which is computed from internal and external weight of clusters. Finally, the proposed approach is evaluated and compared to other approaches. The evaluation results show the preference of our approach relative to other ones. Keywords- Anomaly; DARPA data set; flow; graph clustering; intrusion detection I. INTRODUCTION Intrusion detection systems (IDS) are divided into packet- based and flow-based categories. In packet-based IDSs, all of network packets which are passing through a desired point are collected and analyzed. Basically, a packet consists of two parts. One part is packet header that is including the information about packet source and destination and the other part is the content of a packet which includes data. In these IDSs, both of these parts are investigated to detect the anomalies. In contrast, the flow-based IDSs are based on the network flows. One of the important properties of this method is that it doesn't include the content of the packet, however, it contains just the information such as the source and destination addresses. Therefore, flow-based IDSs increase the speed of intrusion detection process and are suitable for high speed networks that it solves the scalability problem in the network security [1], [2], [3]. In more specific, one of the approaches for solving the scalability problem in packet-based IDSs is packet header extraction approach. Mahoney et al. [4] proposed an approach by means of the packet header extracting such that the anomaly is detected using the normal values learning for each packet header. In addition, Manandhar et al. [5] proposed an approach that used the traffic data. They checked the information of packet header for anomaly detection. Also, Karimpour et al. [6] proposed a flow-based clustering algorithm to detect attacks in DARPA data set. They used some proper time intervals and threshold points to reveal the attacks in high accuracy. As comparing of these two methods, the flow-based IDS is more suitable than packet-based one in high speed networks. Moreover, in these approaches, flow data are analyzed instead of the contents of packets. In 2010, Sperotto et al. [7] devised an approach based on the flows in the network and used the time series to reveal the attacks. In this study, the performance of flow-based IDS in comparison with the packet-based one in the network is proved and a data set is proposed to evaluate the flow-based IDSs. Further, Hellemons et al. [8] proposed another method that is based on the flow concept in 2012. This research includes two parts. In first part, a high performance algorithm is designed for intrusion detection. In second part, a prototype of the IDS has been implemented. In fact, the authors proposed an algorithm to detect dictionary attack. The algorithm splits the attacks into 2 or more phases. The criteria that are used in the algorithm are packet per flow criterion and minimum number of flows which are calculated in 1 minute time intervals. Based on this algorithm, threshold points are considered in each phase of the attack. They assumed that the dictionary attack has 3 phases: scan, brute-force and die-off phase. They detected the attack in high accuracy mode by applying threshold points to those three phases. Also, Graph-based intrusion detection systems are the type of security approaches using the properties of the network, not the content of packets. These systems detect the intrusion by analyzing the network graphs that can detect the high scalability attacks such as the worms. In this study, Zhou et al. [9] proposed an approach to use the graph concept for implementing the multi variable time series and their relation in each time, Iliofotou et al. [10] in 2007 proposed an approach to monitor and analyze the network traffic using traffic dispersion graph (TDG). They defined the traffic dispersion graph as the graphic presentation of interactions among the groups of nodes. The advantage of using the traffic dispersion graph is its power to presenting the structural relations of the attacks. Another approach is proposed by Le et al. [11] in 2011 based on the graph theory fundamentals such as degree of nodes, maximum degree of graph and similarity distance of graph. 486 https://sites.google.com/site/ijcsis/ ISSN 1947-5500