Using Early-Stopping to Avoid Overfitting in Wrapper-Based Feature Selection Employing Stochastic Search John Loughrey john.loughrey@cs.tcd.ie P´adraig Cunningham padraig.cunningham@cs.tcd.ie Trinity College Dublin, College Green, Dublin 2, Ireland Abstract It is acknowledged that overfitting can oc- cur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is re- lated to the intensity of the search algo- rithm used during this process. In this pa- per we show that two stochastic search tech- niques (Simulated Annealing and Genetic Al- gorithms) that can be used for wrapper-based feature selection are susceptible to overfit- ting in this way. However, because of their stochastic nature, these algorithms can be stopped early to prevent overfitting. We present a framework that implements early- stopping for both of these stochastic search techniques and we show that this is success- ful in reducing the effects of overfitting and in increasing generalisation accuracy in most cases. 1. Introduction The benefits of wrapper-based techniques for feature selection are well established (Kohavi & Sommerfield, 1995). However, it has recently been recognised that wrapper-based techniques have the potential to overfit the training data (Reunanen, 2003). That is, feature subsets that perform well on the training data may not perform as well on data not used in the train- ing process. Furthermore, the extent of the overfit- ting is related to the depth of the search. Reunanen (2003) shows that, whereas Sequential Forward Float- ing Selection (SFFS) beats Sequential Forward Selec- tion (SFS) on the data used in the training process, the reverse is true on hold-out data. He argues that this is because SFFS is a more intensive search process i.e. it explores more states. In this paper we show that this tendency to overfit can be quite acute in stochastic search algorithms such as Genetic Algorithms (GA) and Simulated Annealing (SA) as these algorithms are able to intensively explore the search space. We show that early-stopping is an effective strategy for preventing overfitting in feature selection using SA or GA. It is worth noting that the applicability of early-stopping depends on the stochas- tic nature of the search. This idea would not be readily applicable in more directed search strategies such as the SFFS and SFS strategies evaluated by Reunanen (2003) or the standard Backward Elimination strategy that is popular in wrapper-based feature selection. In (Loughrey & Cunningham, 2004) we approach this problem using a modified genetic algorithm (GA) that stops the search early in order to avoid overfitting, and we find that the results we get are favourable. In this paper we show that SA is amenable to a neat form of early-stopping. Optimisation using SA is anal- ogous to the cooling of metals and we show how the SA can be quenched so that the search freezes before overfitting can occur. In section 3.2 we show how SA can be speeded up to avoid overfitting and in section 3.3 we show how to calibrate this process using cross- validation. The paper is organised as follows. We begin in section 2 with a discussion of the wrapper-based approach to feature selection and an illustration of the potential overfitting problem. The early-stopping solution to overfitting is described in section 3. The approach is evaluated on SA and GA in section 4 and the paper concludes with some suggestions for future work in sec- tion 5.