An unsupervised feature selection algorithm based on ant colony optimization Sina Tabakhi, Parham Moradi n , Fardin Akhlaghian Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran article info Article history: Received 29 August 2013 Received in revised form 2 March 2014 Accepted 11 March 2014 Keywords: Feature selection Dimensionality reduction Univariate technique Multivariate technique Filter approach Ant colony optimization abstract Feature selection is a combinatorial optimization problem that selects most relevant features from an original feature set to increase the performance of classication or clustering algorithms. Most feature selection methods are supervised methods and use the class labels as a guide. On the other hand, unsupervised feature selection is a more difcult problem due to the unavailability of class labels. In this paper, we present an unsupervised feature selection method based on ant colony optimization, called UFSACO. The method seeks to nd the optimal feature subset through several iterations without using any learning algorithms. Moreover, the feature relevance will be computed based on the similarity between features, which leads to the minimization of the redundancy. Therefore, it can be classied as a lter-based multivariate method. The proposed method has a low computational complexity, thus it can be applied for high dimensional datasets. We compare the performance of UFSACO to 11 well-known univariate and multivariate feature selection methods using different classiers (support vector machine, decision tree, and naïve Bayes). The experimental results on several frequently used datasets show the efciency and effectiveness of the UFSACO method as well as improvements over previous related methods. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction The amount of data has been growing rapidly in recent years, and data mining as a computational process involving methods at the intersection of machine learning, statistics, and databases, deals with this huge volume of data, processes and analyzes it (Liu and Yu, 2005). The purpose of data mining is to nd knowledge from datasets, which is expressed in a comprehensible structure. A major problem associated with data mining applications such as pattern recognition is the curse of dimensionalityin which the number of the features is larger than the number of patterns that leads to the large number of classier parameters (e.g., weights in a neural network). Therefore, the classier performance may be reduced, and the computational complexity for processing the data will be signicantly increased (Theodoridis and Koutroumbas, 2008). Moreover, in the presence of many irrelevant and redundant features, data mining methods tend to t to the data which decrease its generalization. Consequently, a common way to over- come this problem is reducing dimensionality by removing irrele- vant and redundant features and selecting a subset of useful features from the input feature set. Feature selection is one of the important and frequently used techniques in data preprocessing for data mining. It brings the immediate effects for applications such as speeding up a data mining algorithm and improving mining performance (Akadi et al., 2008; Ferreira and Figueiredo, 2012; Lai et al., 2006; Yu and Liu, 2003). Feature selection has been applied to many elds such as text categorization (Chen et al., 2006; Uğuz, 2011; Yang et al., 2011), face recognition (Kanan and Faez, 2008; Yan and Yuan, 2004), cancer classication (Guyon et al., 2002; Yu et al., 2009; Zibakhsh and Abadeh, 2013), nance (Huang and Tsai, 2009; Marinakis et al., 2009), and customer relationship management (Kuri-Morales and Rodríguez-Erazo, 2009). Feature selection is a process of selecting a subset of features from a larger set of features, which leads to the reduction of the dimensionality of feature space for a successful classication task. The whole search space contains all possible subsets of features, meaning that its size is 2 n , where n is the number of features. Therefore, many problems related to feature selection are shown to be NP-hard. Consequently, nding the optimal feature subset is usually intractable in a reasonable time (Liu and Motoda, 2007; Meiri and Zahavi, 2006; Narendra and Fukunaga, 1977; Peng et al., 2005). To overcome the time complexity problem, there have been proposed approximation algorithms to nd a near-optimal feature subset in polynomial time. These algorithms can be classied into four categories including lter, wrapper, embedded, and hybrid approaches (Gheyas and Smith, 2010; Liu and Motoda, 2007; Liu Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Articial Intelligence http://dx.doi.org/10.1016/j.engappai.2014.03.007 0952-1976/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Tel.: þ98 8716668513. E-mail addresses: sina.tabakhi@ieee.org (S. Tabakhi), p.moradi@uok.ac.ir (P. Moradi), f.akhlaghian@uok.ac.ir (F. Akhlaghian). Engineering Applications of Articial Intelligence 32 (2014) 112123