Feature Ranking Utilizing Support Vector Machines’ SVs Nahla Barakat Faculty of Informatics and Computer Science British University in Egypt (BUE) Cairo, Egypt Abstract—Classification performance of different algorithms can often be improved by excluding irrelevant input features. This was the main motivation behind the significant number of studies proposing different families of feature selection techniques. The objective is to find a subset of features that can describe the input space, at least, as good as the original set of features. In this paper, we propose a hybrid method for feature ranking for support vector machines (SVMs); utilizing SVMs support vectors (SVs). The method first finds the subset of features that least contribute to interclass separation. These features are then re-ranked using correlation based feature selection algorithm, as a final step. Results on four benchmark medical data sets show that the proposed method, though simple, can be a promising feature reduction method for SVMs and other families of classifiers as well. Index Terms—feature ranking, support vector machines. I. INTRODUCTION Performance of classification algorithms can be affected by the presence of irrelevant input features [1]. This was the main motivation behind the increasing interest in feature selection/ ranking studies over the last two decades. The main objective of these studies is to obtain a subset of features, which can describe the problem domain, at least, as good as the original, full set of features [2]. Feature selection algorithms can be classified into three broad categories; filter, wrapper and embedded methods [2- 4]. Hybrid approaches however, have been recently proposed, which combines concepts from filter and wrapper techniques [5]. Filter methods exclude irrelevant features as a preprocessing step, before applying the induction algorithm. The individual input features are ordered according to a predefined measure (ex. principal/independent component analysis, correlation criteria, Fisher’s discriminant scores, etc.) [1]. However, filter methods may not be the best techniques in case of nonlinear relationships among input features [6]. In wrapper based approaches, the prediction performance of a given classifier is used to assess the importance of a subset of features[3] . In particular, different candidate subsets are evaluated according to the classifier’s performance. The subset of features which produces the lowest classification error is considered the most relevant feature list. This is achieved using a searching procedure in the space of all possible subsets of features; for ex. greedy, forward or backward methods [6]. Forward search methods start with an empty set of features and progressively add features until reaching the best classification performance. Backward selection starts with the full set of features, and then starts eliminating the least relevant features. However, wrapper methods are computationally expensive[6]. Embedded methods are usually performed as part of the induction algorithm. An example of such methods is the SVM-RFE algorithm, an SVM weight-based method [7]. Several application domains have benefited from feature selection, ex. medical data mining, biomedical informatics, gene selection in micro array data, and text mining, to mention some. In this paper, we propose a simple method for feature ranking, utilizing Support Vector Machine (SVMs) support vectors (SVs), and building on interclass separability concept. SVMs have been chosen due to their superior classification performance, demonstrated over a wide range of problem domains. The main idea of the method is to find and rank the features that least discriminate positive from negative class (in case of binary classification tasks), in two steps: the first step finds each individual feature, where the difference between its two means, in positive and negative SVs is not statistically significant. The second step uses correlation based feature selection, which is only applied to the least relevant features obtained in the first step. The proposed method has been evaluated in terms of accuracy, recall, precision and area under the precision– recall curve (PRC). The obtained results show improved performance for both SVMs as well as other families of classifiers, when trained with the most relevant features, compared to the performance in case of the full set of features. The paper is organized as follows: Section II briefly highlights feature selection methods for SVMs, while Section III provides a brief background of SVMs and PRC. Section IV introduces the proposed method, followed by experimental methodology then results and discussion in Sections V and VI. The paper is concluded in Section VII. II. RELATED WORK In the context of SVMs, there have been five main themes of research in the area of feature selection; formulation of an optimization problem [8], sparse SVMs 978-1-4799-0048-0/13/$31.00 ©2013 IEEE 401