Robust ensemble feature selection for high dimensional data sets Ben Brahim Afef LARODEC ISGT, University of Tunis Email: afef.benbrahim@yahoo.fr Limam Mohamed LARODEC ISGT, University of Tunis Dhofar University, Oman Email: mohamed.limam@isg.rnu.tn Abstract—Feature selection is an important and frequently used technique in data preprocessing for performing data mining on large scale data sets. Several feature selection methods exist in the literature, each of them uses a specific feature evaluation criterion and may produce different feature subsets even when applied to the same data set. There is not a better resulting subset than the others but all the obtained subsets are the best subsets among the whole feature space. Thinking of a way to take advantage of different feature selection methods simultaneously is a challenging data mining problem. Recently, ensemble feature selection concept have been introduced to help solve this problem. Multiple feature selections are combined in order to produce more robust feature subsets and better classification results. However, one of the most critical decisions when performing ensemble feature selection is the aggregation technique to use for combining the resulting feature lists from the multiple algorithms into a single decision for each feature. In this paper, we propose a robust feature aggregation technique to combine the results of three different filter methods. Our aggregation technique is based on measuring feature algorithms confidence and conflict with the other ones in order to assign a reliability factor guiding the final feature selection. Experiments on a high dimensional data sets show that the proposed approach outperforms the single feature selection algorithms as well as two well known aggregation methods in terms of classification performance. I. I NTRODUCTION In many real world situations, we are increasingly faced with problems characterized by large number of features, not all relevant to the problem at hand. Feeding learning algorithms with all the features may cause serious problems to many machine learning algorithms with respect to scalability and learning performance. Therefore, feature selection is con- sidered to be one of current challenges in statistical machine learning for high-dimensional data. Feature selection tends to reduce the dimensionality of the feature space, avoiding the well known dimensionality curse problem which is defined as the sensitivity of a method to variations in the training set [1]. Feature selection is a process consisting of choosing a subset of original features so that the feature space is optimally reduced according to a certain evaluation criterion. High dimensional data is a real challenge to many existing feature selection methods with respect to efficiency and effectiveness. Feature selection can be divided into three categories : fil- ters, wrappers, and embedded methods [2] [3]. Filter methods are directly applied to datasets and generally assign relevance scores to features by looking only at the intrinsic properties of the data. High scoring features are presented as input to the classification algorithm. Filter methods ignore feature dependencies and this is their main shortcoming. Wrapper and embedded methods, on the other hand, generally use a specific learning algorithm to evaluate a specific subset of features. Wrapper approaches use a performance measure of a learning algorithm to guide the feature subset search and have the ability to take into account feature dependencies, however they are computationally intensive, especially if building the learning algorithm has a high computational cost [4]. Embed- ded methods use the internal parameters of some learning algorithms to evaluate features. The search for an optimal subset of features is built into the classifier construction, this is why they are less computationally intensive than wrapper method. When the number of features becomes very large, the filter model is usually chosen as it is computationally efficient, fast and independent of the classification algorithm. We propose in this paper an ensemble feature selection approach to deal with high dimensional data. In this approach, first, an ensemble of different filter methods is applied to the data set in order to choose the best subsets for a given cardinality. Given that each of the filters uses a specific feature evaluation criterion, we may not say that a resulting subset is better than the others but rather say that all the obtained subsets are the best subsets among the whole feature space. For this reason we naturally think of ensemble learning [5] as a way to combine independent feature subsets in order to get a hopefully more robust feature subset. After this step, an SVM classifier is trained on each of the projections of the resulting feature subsets on the training data. Cross validation is used to obtain the classification performance of each setting. This classification performance is used to measure the reliability of selected features. Initial feature weights obtained from the base filter algorithms are adjusted based on the features’ corresponding reliability. To get a final subset, a robust aggregation technique is used to select best features from different individual subsets based on their adjusted weights. Thus, simplicity and fastness of filters is employed to select the best feature subsets among the whole feature space, then ability of a classification algorithm to provide an associated classification performance is exploited to guide the choice of