Towards robust feature selection techniques Yvan Saeys yvan.saeys@psb.ugent.be Thomas Abeel thomas.abeel@psb.ugent.be Yves Van de Peer yves.vandepeer@psb.ugent.be Department of Plant Systems Biology, VIB, Technologiepark 927, 9052 Gent, Belgium, Department of Molecular Genetics, Ghent University, Technologiepark 927, 9052 Gent, Belgium Abstract Robustness of feature selection techniques is a topic of recent interest, especially in high dimensional domains with small sam- ple sizes, where selected feature subsets are subsequently analysed by domain experts to gain more insight into the problem modelled. In this work, we investigate the robustness of various feature selection techniques, and provide a general scheme to improve robust- ness using ensemble feature selection. We show that ensemble feature selection tech- niques show great promise for small sample domains, and provide more robust feature subsets than a single feature selection tech- nique. In addition, we also investigate the ef- fect of ensemble feature selection techniques on classification performance, giving rise to a new model selection strategy. 1. Introduction During the past decade, the use of feature selection for knowledge discovery has become increasingly im- portant in many domains that are characterized by a large number of features, but a small number of sam- ples. Typical examples of such domains include text mining, computational chemistry and the bioinformat- ics and biomedical field, where the number of features (problem dimensionality) often exceeds the number of samples by orders of magnitude (Saeys et al., 2007). When using feature selection in these domains, not only model performance but also robustness of the fea- ture selection process is important, as domain experts would prefer a stable feature selection algorithm over an unstable one when only small changes are made to the dataset. Surprisingly, the robustness (stability) of feature selec- tion techniques is an important aspect that received only relatively little attention during the past. Re- cent works in this area mainly focus on the stability indices to be used for feature selection, introducing measures based on Hamming distance (Dunne et al., 2002), correlation coefficients (Kalousis et al., 2007), consistency (Kuncheva, 2007) and information theory (Kr´ ızek et al., 2007). The work of Kalousis et al. (2007) also presents an extensive comparative evalu- ation of feature selection stability over a number of high-dimensional datasets. However, most of these re- cent works only focus on the stability of single feature selection techniques. In this work, we investigate whether the use of ensem- ble feature selection techniques can be used to yield more robust feature selection techniques, and whether combining multiple methods has any effect on the clas- sification performance. 2. Methods 2.1. Quantification of robustness Depending on the outcome of a feature selection tech- nique, the result can be either a set of weights, a rank- ing, or a particular feature subset. In order to assess robustness, a subsampling scheme is used that gener- ates k subsamples containing 90% of the original data. The robustness of a technique is then measured by the average over all pairwise similarity comparisons between the different feature selectors: S tot = 2 ∑ k i=1 ∑ k j=i+1 S(f i , f j ) k(k - 1) where f i represents the outcome of the feature selec- tion method applied to subsample i (1 ≤ i ≤ k), and S(f i , f j ) represents a similarity measure between f i and f j . Here, we focus on similarities between rankings - using the Spearman rank correlation coefficient - and sub- sets, using the Jaccard index (Kalousis et al., 2007) or 45