Using Early-Stopping to Avoid Overﬁtting in Wrapper-Based Feature Selection Employing Stochastic Search John Loughrey john.loughrey@cs.tcd.ie P´adraig Cunningham padraig.cunningham@cs.tcd.ie Trinity College Dublin, College Green, Dublin 2, Ireland Abstract It is acknowledged that overﬁtting can oc- cur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overﬁtting is re- lated to the intensity of the search algo- rithm used during this process. In this pa- per we show that two stochastic search tech- niques (Simulated Annealing and Genetic Al- gorithms) that can be used for wrapper-based feature selection are susceptible to overﬁt- ting in this way. However, because of their stochastic nature, these algorithms can be stopped early to prevent overﬁtting. We present a framework that implements early- stopping for both of these stochastic search techniques and we show that this is success- ful in reducing the eﬀects of overﬁtting and in increasing generalisation accuracy in most cases. 1. Introduction The beneﬁts of wrapper-based techniques for feature selection are well established (Kohavi & Sommerﬁeld, 1995). However, it has recently been recognised that wrapper-based techniques have the potential to overﬁt the training data (Reunanen, 2003). That is, feature subsets that perform well on the training data may not perform as well on data not used in the train- ing process. Furthermore, the extent of the overﬁt- ting is related to the depth of the search. Reunanen (2003) shows that, whereas Sequential Forward Float- ing Selection (SFFS) beats Sequential Forward Selec- tion (SFS) on the data used in the training process, the reverse is true on hold-out data. He argues that this is because SFFS is a more intensive search process i.e. it explores more states. In this paper we show that this tendency to overﬁt can be quite acute in stochastic search algorithms such as Genetic Algorithms (GA) and Simulated Annealing (SA) as these algorithms are able to intensively explore the search space. We show that early-stopping is an eﬀective strategy for preventing overﬁtting in feature selection using SA or GA. It is worth noting that the applicability of early-stopping depends on the stochas- tic nature of the search. This idea would not be readily applicable in more directed search strategies such as the SFFS and SFS strategies evaluated by Reunanen (2003) or the standard Backward Elimination strategy that is popular in wrapper-based feature selection. In (Loughrey & Cunningham, 2004) we approach this problem using a modiﬁed genetic algorithm (GA) that stops the search early in order to avoid overﬁtting, and we ﬁnd that the results we get are favourable. In this paper we show that SA is amenable to a neat form of early-stopping. Optimisation using SA is anal- ogous to the cooling of metals and we show how the SA can be quenched so that the search freezes before overﬁtting can occur. In section 3.2 we show how SA can be speeded up to avoid overﬁtting and in section 3.3 we show how to calibrate this process using cross- validation. The paper is organised as follows. We begin in section 2 with a discussion of the wrapper-based approach to feature selection and an illustration of the potential overﬁtting problem. The early-stopping solution to overﬁtting is described in section 3. The approach is evaluated on SA and GA in section 4 and the paper concludes with some suggestions for future work in sec- tion 5.