Journal of Information Security Research Volume 3 Number 4 December 2012 153 Ikram Chaïri, Souad Alaoui, Abdelouahid Lyhyaoui LTI Lab, Department of Electrical and Industrial ENSA of Tangier Abdelmalek Essaâdi University BP: 1818, Tanger Principal, Tanger Morocco {Chairikram, lyhyaoui}@gmail.com, souad_a2002@yahoo.fr ABSTRACT: The majority of learning systems usually assume that training sets are balanced, however, in real world data this hypothesis is not always true. The problem of between-class imbalance is a challenge that has attracted growing attention from both academia and industry, because of its critical influence on the performance of learning systems. Many solutions were proposed to resolve this problem: Generally, the common practice for dealing with imbalanced data sets is to rebalance them artificially by using sampling methods. Unfortunately, these methods can’t give a high performance of learning. In this paper, we propose a new method based on Sample Selection (SS), to deal with the problem of between class imbalance. We consider that creating balance between classes by maintaining those examples located near the border line improves the performance of the classifier. To reduce the computational cost of selecting all samples, we propose a clustering method as a first step in order to determine the critical centers, and then select samples from those critical clusters. Experimental results with Multi-Layer Perceptron (MLP) architecture, on well known Intrusion Detection data set, show that our approach allows to attend the precision of Boosting methods, that we will explain how it can be considered like a SS method. Keywords: Imbalanced Data, Intrusion Detection System, Nearest Opposing Pairs, Sample Selection, Boosting, Classification Received: 12 August 2012, Revised 29 September 2012, Accepted 5 October 2012 © 2012 DLINE. All rights reserved 1. Introduction The growth of raw data caused by the development of sciences and technologies has created an immense opportunity to improve data engineering. The problem of imbalanced data emerged as more and more researchers realized that their data sets were imbalanced and that this imbalance caused suboptimal classification performance. In recent years, the imbalanced learning problem has generated a significant amount of interest from academia, industry, and government funding agencies. The main problematic was and still to find a classifier which can learn from an imbalanced data without ignoring the minority class. To deal with this class imbalance problem, many solutions were proposed, such as the case of sampling methods, cost function method, kernel based method and active learning method [1], [2], [3]. Sampling methods still the most widely used method to deal with the problem of imbalanced class [4]. Instead of the problem of imbalanced data, the approximation of the misclassification error used in the learning system can also contribute negatively in decreasing the accuracy and the quality of learning. That is why a different method of sample selection has been proposed to deal with this Balancing Distribution of Intrusion Detection Data Using Sample Selection