International Journal of Computer Applications (0975 – 8887) Volume 75– No.6, August 2013 22 Data Preprocessing for Intrusion Detection System using Swarm Intelligence Techniques S. Revathi Ph.D. Research Scholar Government Arts College Coimbatore-18 A. Malathi,Ph.D Assistant Professor Government Arts College Coimbatore-18 ABSTRACT Due to access of malicious data in internet, Intrusion detection system becomes an important element in system security that controls real time data and leads to huge dimensional problem, so a data pre-processing is necessary to reduce haziness and to clean network data. To reduce false positive rate and to increase efficiency of detection, the paper proposed a new swarm intelligence technique to solve complex optimization problem. The paper work based on hybrid Simplified Swarm Optimization (SSO) algorithm to pre-process the data. SSO is a simplified Particle Swarm Optimization (PSO) that has a self-organizing ability to emerge in highly distributed control problem area, and is versatile, strong and cost effective to resolve complex computing environments. It recognize not only known attacks but also filters noisy and irrelevant data that may result on knowledge Discovery and Data Mining (KDDCup 1999) dataset and compared to a new hybrid Partial Swarm Optimization with Random Forest (PSO-RF) and with other benchmark classifiers. The testing result shows that the proposed method provides competitively high detection rates and produce a near optimal solution. KEYWORD Swarm intelligence, Simplified Swarm Optimization, Partial Swarm Optimization, Random Forest, Intrusion detection. 1. INTRODUCTION The widespread use of computers and internet has enhanced the worth of life for many people, but it also exposed to increasing security threats both externally and internally. The security of a computer system is compromised when an intrusion takes place [1]. Different technologies have been developed and deployed to protect computer systems against network attacks, such as firewall, message encryption, secured network protocols, password protection, and so on. Despite Intrusion prevention techniques, it is nearly impossible to have a completely secured system. As a result, Intrusion Detection System (IDS) have become an essential component of security to detect these threats, identify and track the intruders. As IDS must have a high Detection Rate (DR), with a low False Alarm Rate (FAR) which is a challenging task [4]. In recent years many biology inspired approaches have made their appearance in a variety of research fields, and plays a vital role in intrusion to improve their efficiency and performance. Swarm intelligence is one of them [2]. Techniques and algorithms of this research field draw their inspiration from the behavior of insects, birds and fishes, and their unique ability to solve complex tasks in the form of swarms. Among swarm intelligence techniques Particle Swarm Optimization (PSO) is a popular heuristic techniques for optimization, but it suffers from premature convergence of high dimension multimodal problem which flops to achieve best fitness value [3]. The KDDcup99 dataset used for intrusion detection is a raw data which highly susceptible to noise, missing values and inconsistency. To improve quality of raw data, data pre- processing and filtering is required which increase data efficiency. As a result the paper proposed a novel simplified swarm optimization, to mine the raw data. The main objective of the paper is to screen incomplete data and to reduce irrelevant feature. SSO improves the performance efficiency, time and memory than PSO-RF and other classifiers for filtering data. The rest of the paper is structured as follows: section 2 present some related work based on swarm intelligence for intrusion detection dataset. Section 3 present an overview of framework. Section 4 explains about technique in swarm intelligence and data mining. Section 5 explain in detail about proposed work of hybrid SSO algorithm and its efficiency is compared with PSO- RF and other classifier. Section 6 concludes some result based on proposed work. 2. RELATED WORK Intrusion Detection Systems gross raw network data or audit records as input, process which leads to a huge network traffic data size and the invisibility of intrusive patterns which are normally hidden among the irrelevant and redundant features to identify it as normal or attack. Researchers have identified that pre-processing is needed for better results and used various approaches. A new collaborating filtering technique for pre- processing the probe type of attacks is proposed by G. Sunil Kumar [5], based on hybrid classifiers on binary particle swarm optimization and random forests algorithm for the classification of probe attacks in a network. Dharmendra G. Bhatti [10] proposed a method to reduce false positive rate using data pre- processing method. Now-a-days biological inspired approaches have been extensively instigated in network intrusion pattern detection. This field of study is known as “swarm intelligence” and has attracted an increasingly number of researchers since the proposal of Particle Swarm Optimization (PSO) [27] algorithm and also of