Novel Methods for the Feature Subset Ensembles Approach Mohamed A. Aly Electrical Engineering, Caltech, Pasadena, CA 91125 mohamedadaly@gmail.com Amir F. Atiya Dept Computer Engineering, Cairo University, Giza, Egypt amiratiya@link.net Abstract Ensemble learning technique attracted much attention in the past few years. Instead of using a single prediction model, this approach utilizes a number of diverse accu- rate prediction models to do the job. Many methods have been proposed to build such accurate diverse ensembles, of which bagging and boosting were the most popular. An- other method, called Feature Subset Ensembles FSE, is thoroughly investigated in this work. This technique builds ensembles by assigning each individual prediction model in the ensemble a distinct feature subset from the pool of available features. In this paper several novel variations to the basic FSE are proposed. Extensive comparisons are carried out to compare the proposed FSE variants with the basic FSE approach. 1 Introduction Over the history of machine learning one single learning model was typically built to solve a given problem at hand. From a set of candidate prediction models or networks, only one is chosen to do the job. However, this single model may not be the best one available. Moreover, help- ing it with other prediction models can prove advantageous in improving the prediction accuracy. The technique of us- ing multiple prediction models for solving the same prob- lem is known as ensemble learning. It has proved its effec- tiveness over the last few years. The ensemble approach has been an active research topic in the past few years [8, 9]. In this approach, a group of prediction models are trained and used instead of just employing the best prediction model. The outputs of the individual prediction models are combined together, using simple averaging or voting for example, to produce the en- semble output. This technique has been proved, both the- oretically and empirically, to significantly outperform the single prediction model approach [35, 19]. Using multi- ple prediction models can get around a single prediction model overfitting the data and can decrease the variance in its predictions. However, for the ensemble to produce good performance, the component prediction models not only need to be accurate, but they also need to be diverse i.e. their generalisation errors should be as least correlated as possible [18, 5]. This is intuitive, because nothing can be gained from using prediction models that give identical predictions. Researchers have developed many ways that are capable of producing accurate diverse prediction model ensembles [8, 9, 5]. The most popular methods are bag- ging [4], and boosting [13, 14]. Both bagging and boost- ing are based on creating an ensemble of networks, each trained using a different subset of training examples. A relatively novel ensemble approach, called feature subset ensembles (FSE), has been proposed in the litera- ture. It has been named by different names in the literature, but we propose to unify the naming under this term. The approach is based on creating an ensemble of networks, where each network operate on a different subset of fea- tures (input variables). The FSE approach has not yet been sufficiently explored in the literature, and this paper serves to shed some light on its capabilities. Specifically, we pro- pose several variants of the FSE approach, and conduct a large scale comparison study. Some of the proposed vari- ants turned out to outperform the basic FSE. 2 The Feature Subset Ensembles Approach Feature selection is a very important part of the preprocess- ing phase in machine learning [23, 8, 25] and statistical pattern recognition [37, 22, 19, 27]. In many real world situations, we are faced with problems having hundreds or thousands of features, some of which are irrelevant to the problem at hand. Feeding learning algorithms with all the features can result in a deteriorating performance, as the algorithm can get stuck trying to figure out which features are useful and which are not. Therefore, feature selection is employed as a preliminary step, to selected a subset of the input features, that contains potentially more useful features. In addition, feature selection tends to re- duce the dimensionality of the feature space, avoiding the well-known dimensionality curse problem [19]. The disadvantage of feature subset selection is that some features that may seem less important, and are thus discarded, may bear valuable information. It seems a bit of a waste to throw away such information, that could pos- sibly in some way contribute to improving model perfor- mance. This is where Feature Subset Ensemble (FSE) comes into play. It simply partitions the input features among the individual prediction models in the ensemble. Hence, no information is discarded. It utilizes all the avail- able information in the training set, and at the same time not overload a single prediction model with all the features, as this may lead to poor learning. Let us give an illustative example. Assume we have ten