An unsupervised feature selection algorithm based on ant colony optimization Sina Tabakhi, Parham Moradi n , Fardin Akhlaghian Department of Computer Engineering, University of Kurdistan, Sanandaj, Iran article info Article history: Received 29 August 2013 Received in revised form 2 March 2014 Accepted 11 March 2014 Keywords: Feature selection Dimensionality reduction Univariate technique Multivariate technique Filter approach Ant colony optimization abstract Feature selection is a combinatorial optimization problem that selects most relevant features from an original feature set to increase the performance of classification or clustering algorithms. Most feature selection methods are supervised methods and use the class labels as a guide. On the other hand, unsupervised feature selection is a more difficult problem due to the unavailability of class labels. In this paper, we present an unsupervised feature selection method based on ant colony optimization, called UFSACO. The method seeks to find the optimal feature subset through several iterations without using any learning algorithms. Moreover, the feature relevance will be computed based on the similarity between features, which leads to the minimization of the redundancy. Therefore, it can be classified as a filter-based multivariate method. The proposed method has a low computational complexity, thus it can be applied for high dimensional datasets. We compare the performance of UFSACO to 11 well-known univariate and multivariate feature selection methods using different classifiers (support vector machine, decision tree, and naïve Bayes). The experimental results on several frequently used datasets show the efficiency and effectiveness of the UFSACO method as well as improvements over previous related methods. & 2014 Elsevier Ltd. All rights reserved. 1. Introduction The amount of data has been growing rapidly in recent years, and data mining as a computational process involving methods at the intersection of machine learning, statistics, and databases, deals with this huge volume of data, processes and analyzes it (Liu and Yu, 2005). The purpose of data mining is to find knowledge from datasets, which is expressed in a comprehensible structure. A major problem associated with data mining applications such as pattern recognition is the “curse of dimensionality” in which the number of the features is larger than the number of patterns that leads to the large number of classifier parameters (e.g., weights in a neural network). Therefore, the classifier performance may be reduced, and the computational complexity for processing the data will be significantly increased (Theodoridis and Koutroumbas, 2008). Moreover, in the presence of many irrelevant and redundant features, data mining methods tend to fit to the data which decrease its generalization. Consequently, a common way to over- come this problem is reducing dimensionality by removing irrele- vant and redundant features and selecting a subset of useful features from the input feature set. Feature selection is one of the important and frequently used techniques in data preprocessing for data mining. It brings the immediate effects for applications such as speeding up a data mining algorithm and improving mining performance (Akadi et al., 2008; Ferreira and Figueiredo, 2012; Lai et al., 2006; Yu and Liu, 2003). Feature selection has been applied to many fields such as text categorization (Chen et al., 2006; Uğuz, 2011; Yang et al., 2011), face recognition (Kanan and Faez, 2008; Yan and Yuan, 2004), cancer classification (Guyon et al., 2002; Yu et al., 2009; Zibakhsh and Abadeh, 2013), finance (Huang and Tsai, 2009; Marinakis et al., 2009), and customer relationship management (Kuri-Morales and Rodríguez-Erazo, 2009). Feature selection is a process of selecting a subset of features from a larger set of features, which leads to the reduction of the dimensionality of feature space for a successful classification task. The whole search space contains all possible subsets of features, meaning that its size is 2 n , where n is the number of features. Therefore, many problems related to feature selection are shown to be NP-hard. Consequently, finding the optimal feature subset is usually intractable in a reasonable time (Liu and Motoda, 2007; Meiri and Zahavi, 2006; Narendra and Fukunaga, 1977; Peng et al., 2005). To overcome the time complexity problem, there have been proposed approximation algorithms to find a near-optimal feature subset in polynomial time. These algorithms can be classified into four categories including filter, wrapper, embedded, and hybrid approaches (Gheyas and Smith, 2010; Liu and Motoda, 2007; Liu Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/engappai Engineering Applications of Artificial Intelligence http://dx.doi.org/10.1016/j.engappai.2014.03.007 0952-1976/& 2014 Elsevier Ltd. All rights reserved. n Corresponding author. Tel.: þ98 8716668513. E-mail addresses: sina.tabakhi@ieee.org (S. Tabakhi), p.moradi@uok.ac.ir (P. Moradi), f.akhlaghian@uok.ac.ir (F. Akhlaghian). Engineering Applications of Artificial Intelligence 32 (2014) 112–123