Simultaneous Feature Ranking and Classification using Ant Colony Optimization applied in Multilayer Perceptron Neural Network Mohammad Shokouhifar 1 , Fardad Farokhi 2 1 Member of Scientific Association of Electrical & Electronic Engineering 2 Professor asistant of Electrical & Electronic Engineering Islamic Azad University of Central Tehran Branch, Tehran, Iran 1 Shokoohi24@yahoo.com, 2 F_farokhi@iauctb.ac.ir Abstract—This paper presents a new online feature selection method using ant colony optimization (ACO). Proposed ACO algorithm simultaneously ranks features and constructs a classifier more affect by high importance features. We introduce an importance factor for each feature in multi layer perceptron neural network structure. These factors change iteratively by the associated weights of features linking to the first hidden layer, using ACO. Finally we select the optimal feature subset that has the shortest feature vector length and the best classifier performance. We use the classifier performance and the calculated importance factors of the features as heuristic information for guide the ants. Typically wrapper methods have good performance in terms of classification accuracy and number of selected features, but they are too time-consuming. So in this paper we combine the feature selection process with the learning algorithm, aims to reduce computation time. We tested the proposed algorithm by some UCI datasets. To demonstrate the robustness of our algorithm, we added some new features that are linear combinations of the original ones, and also some extra features that are noise. Experimental results show the effectiveness of our algorithm to feature selection. Keywords- feature selection; ant colony optimization; feature ranking; classification; mlp neural network; learning algorithm; I. INTRODUCTION Feature selection (FS) is an essential problem in field of pattern recognition, which aims to reduce the number of features are used for recognition, with no reduction in predictive accuracy. The inputs of a classifier are a set of features that have different effect on the classification performance. Some features have no ability to increase the discriminative power of the classifier. Some may be highly correlated and some may be irrelevant for that specific classification [1]. Sometime obtaining these extra irrelevant features is even unsafe or risky, for example in some medical applications. A reduced feature subset requires fewer patterns in the training procedure of a classification algorithm. In addition, the training procedure would take less time and due to fewer features; and also better accuracy of classifier is achieved [2]. FS algorithms can be classified into two categories based on subset evaluation strategies [3]: wrapper approaches (close-loop approaches) and filter approaches (open-loop approaches). If the evaluation criterion is coordinate to the task of the learning algorithm, then it is a wrapper approach; these methods search through the feature subset space using the estimated accuracy as a measure of feature subset suitability. If a FS algorithm performs independently of any learning algorithm, the FS algorithm employs the filter approach. These approaches mostly select the features using between-class separabiliy criterion [3]. Other criterion that may be used is mutual information [4]-[6]. The FS methods based on mutual information, consider mainly the dependence between features and select some features with high independence. Wrapper approaches often perform better in term of solution quality compare with filter approaches. However wrapper approaches require more running time than filter approaches. These two mentioned approaches are also classified into three main methods based on their subset search strategies: exhaustive, heuristic, and random search strategy. As a simplest way, the optimal feature subset can be found by evaluation of all possible subsets which is known as exhaustive search. branch and bound [7] and best first search techniques [8] belong to this search strategy. It is obviously that an exhaustive search always results the best solution quality. But this procedure is impractical in term of computation time even for medium size data sets, because for N number of features there are (2 1) possible feature combinations. So as a result, we must find solutions which represent a trade-off between the solution quality and the running time. Usually FS algorithms use heuristic or random search strategies in an attempt to avoid the complexity of exhaustive search. However, often the degree of optimally of the final selected feature subset is reduced in compare with exhaustive search [9]. Sequential selection approaches such as sequential forward selection (SFS) and sequential backward selection (SBS) [10] belong to heuristic search category. SFS starts with an empty set and gradually add the best features to it until an optimal subset is achieved. While SBS starts with the complete feature set and gradually discard the worst features until an optimal subset is retained. The disadvantage of SFS and SBS is that the features once selected/removed, cannot be discard/reselect later. Sequential forward/- backward floating selection approaches [11] proposed to flexibly add and remove features. Among many methods which were proposed for FS problem, evolutionary optimization algorithms such as 2010 The 3rd International Conference on Machine Vision (ICMV 2010) 507 C 978-1-4244-8888-9 /10/$26.00 2010 IEEE