A FEATURE SUBSET SELECTION M ETHOD BASED ON CONDITIONAL M UTUAL INFORM ATION AND ANT COLONY OPTIM IZATION Syed Imran Ali Department of Computer Science National University of Computer and Emerging Sciences, Islamabad, Pakistan s.emran.a@gmail.com Dr. Waseem Shahzad Department of Computer Science National University of Computer and Emerging Sciences, Islamabad, Pakistan waseem.shahzad@nu.edu.pk Abstract Feature subset selection is one of the important problems in a number of fields namely data mining, machine learning, pattern recognition. It refers to the problem of opting for useful features that are neither irrelevant nor redundant. Since most of the data acquired through different sources are not in a proper shape to mine useful patterns from it therefore feature selection is applied over this data to filter out useless features. But since feature selection is a combinatorial optimization problem therefore exhaustively generating and evaluating all possible subsets is intractable in terms of computational cost, memory usage and processing time. Hence such a mechanism is required that intelligently searches for useful set of features in a polynomial time. In this study a feature subset selection algorithm based on conditional mutual information and ant colony optimization is proposed. The proposed method is a pure filter based feature subset selection technique that incurs less computational cost and proficient in terms of classification accuracy. Moreover, along with high accuracy it opts for less number of features. Extensive experimentation is performed based on thirteen benchmark datasets over a number of well known classification algorithms. Empirical results endorse efficiency and effectiveness of the proposed method. Keywords Feature Subset Selection; Conditional Mutual Information, Symmetric Uncertainty; Ant Colony Optimization, Classification. 1. INTRODUCTION In past data was transformed into knowledge manually through data analysis and interpretation. This manual data analysis was highly subjective, slow and costly. But as data generation and recording escalated considerably, manual data analysis became tedious and impractical in many domains. This motivated the need for an efficient and automated knowledge discovery process. It is estimated that information doubles every 20 months in the world. This explosion of data is due to the digital acquisition, generation, storage and retrieval of data. Since data are being generated at a faster pace therefore huge amount of data are not being analyzed due to the shortage of efficient data analysis mechanisms. Moreover, it is very difficult to analyze data in its entirety. Raw data need to be processed in such a way that it helps in analysis and transformation into a more meaningful form i.e. knowledge. In order to analyze data into automatic or semi-automatic manner “Knowledge Discovery in Databases” (KDD) is formulated. Data Reduction is one of the key elements of KDD process. Since data are not gathered with some specific purpose in mind. Hence, these datasets may contain redundant and irrelevant attributes. Inclusion of these attributes can be deterrent to the knowledge discovery, and mislead the process. Moreover, processing time required to analyze these features can increase the overall processing cost. Feature Subset Selection (FSS) is one of the key types of data reduction. The main objective of this step is to find useful features that represent the data and remove those features that are either irrelevant or redundant. A useful feature is neither irrelevant nor redundant. Where an irrelevant feature doesn’t provide any useful information to predict the target concept and redundant feature doesn’t add extra information that might be useful to predict the target concept [1]. FSS helps in a number of ways e.g. it reduces useless features to save computing time and data storage, relevant features improves predictive performance and precludes over-fitting, provides more appropriate description of the target concept. Feature Selection is a combinatorial optimization problem where a feature set containing N number of features can be too large, for exhaustive searching, where N is any integer value. There are two main categories of selection based algorithms, i.e. filter based methods and wrapper based methods. [1, 2, 3, 4]. Filter based methods are those that perform FSS independently of any learning algorithm using some