Feature Ranking Utilizing Support Vector
Machines’ SVs
Nahla Barakat
Faculty of Informatics and Computer Science
British University in Egypt (BUE)
Cairo, Egypt
Abstract—Classification performance of different algorithms
can often be improved by excluding irrelevant input features.
This was the main motivation behind the significant number of
studies proposing different families of feature selection
techniques. The objective is to find a subset of features that can
describe the input space, at least, as good as the original set of
features. In this paper, we propose a hybrid method for feature
ranking for support vector machines (SVMs); utilizing SVMs
support vectors (SVs). The method first finds the subset of
features that least contribute to interclass separation. These
features are then re-ranked using correlation based feature
selection algorithm, as a final step. Results on four benchmark
medical data sets show that the proposed method, though
simple, can be a promising feature reduction method for SVMs
and other families of classifiers as well.
Index Terms—feature ranking, support vector machines.
I. INTRODUCTION
Performance of classification algorithms can be affected
by the presence of irrelevant input features [1]. This was the
main motivation behind the increasing interest in feature
selection/ ranking studies over the last two decades. The
main objective of these studies is to obtain a subset of
features, which can describe the problem domain, at least, as
good as the original, full set of features [2].
Feature selection algorithms can be classified into three
broad categories; filter, wrapper and embedded methods [2-
4]. Hybrid approaches however, have been recently
proposed, which combines concepts from filter and wrapper
techniques [5].
Filter methods exclude irrelevant features as a
preprocessing step, before applying the induction algorithm.
The individual input features are ordered according to a
predefined measure (ex. principal/independent component
analysis, correlation criteria, Fisher’s discriminant scores,
etc.) [1]. However, filter methods may not be the best
techniques in case of nonlinear relationships among input
features [6].
In wrapper based approaches, the prediction performance
of a given classifier is used to assess the importance of a
subset of features[3] . In particular, different candidate
subsets are evaluated according to the classifier’s
performance. The subset of features which produces the
lowest classification error is considered the most relevant
feature list. This is achieved using a searching procedure in
the space of all possible subsets of features; for ex. greedy,
forward or backward methods [6]. Forward search methods
start with an empty set of features and progressively add
features until reaching the best classification performance.
Backward selection starts with the full set of features, and
then starts eliminating the least relevant features. However,
wrapper methods are computationally expensive[6].
Embedded methods are usually performed as part of the
induction algorithm. An example of such methods is the
SVM-RFE algorithm, an SVM weight-based method [7].
Several application domains have benefited from
feature selection, ex. medical data mining, biomedical
informatics, gene selection in micro array data, and text
mining, to mention some.
In this paper, we propose a simple method for feature
ranking, utilizing Support Vector Machine (SVMs) support
vectors (SVs), and building on interclass separability
concept. SVMs have been chosen due to their superior
classification performance, demonstrated over a wide range
of problem domains. The main idea of the method is to find
and rank the features that least discriminate positive from
negative class (in case of binary classification tasks), in two
steps: the first step finds each individual feature, where the
difference between its two means, in positive and negative
SVs is not statistically significant. The second step uses
correlation based feature selection, which is only applied to
the least relevant features obtained in the first step.
The proposed method has been evaluated in terms of
accuracy, recall, precision and area under the precision–
recall curve (PRC). The obtained results show improved
performance for both SVMs as well as other families of
classifiers, when trained with the most relevant features,
compared to the performance in case of the full set of
features.
The paper is organized as follows: Section II briefly
highlights feature selection methods for SVMs, while
Section III provides a brief background of SVMs and PRC.
Section IV introduces the proposed method, followed by
experimental methodology then results and discussion in
Sections V and VI. The paper is concluded in Section VII.
II. RELATED WORK
In the context of SVMs, there have been five main
themes of research in the area of feature selection;
formulation of an optimization problem [8], sparse SVMs
978-1-4799-0048-0/13/$31.00 ©2013 IEEE
401