Bootstrap for model selection: linear approximation of the optimism G. Simon 1 , A. Lendasse 2 , M. Verleysen 1, Université catholique de Louvain 1 DICE - Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium, Phone : +32-10-47-25-40, Fax : +32-10-47-21-80 {gsimon, verleysen}@dice.ucl.ac.be 2 CESAME - Avenue G. Lemaître 4, B-1348 Louvain-la-Neuve, Belgium, lendasse@auto.ucl.ac.be Abstract. The bootstrap resampling method may be efficiently used to estimate the generalization error of nonlinear regression models, as artificial neural networks. Nevertheless, the use of the bootstrap implies a high computational load. In this paper we present a simple procedure to obtain a fast approximation of this generalization error with a reduced computation time. This proposal is based on empirical evidence and included in a suggested simulation procedure. 1 Introduction A large variety of models may be used to describe processes: linear ones, nonlinear, artificial neural networks, and many others. It is thus necessary to compare the various models (for example with regards to their performances and complexity) and choose the best one. The ranking of the models is made according to some criterion like the generalization error, usually defined as the average error that a model would make on an infinite-size and unknown test set independent from the learning one. In practice the generalization error can only be estimated, but there exists some methods to provide such an estimation: the AIC or BIC criteria and the like [1], [2], [3] as well as other well-known statistical techniques: the cross-validation and k-fold [3, 6], the leave-one-out [3, 6], the bootstrap [4, 6] and its unbiased extension the .632 bootstrap [4, 6]. The ideas presented in this paper can be applied both to the bootstrap and the .632 bootstrap. Although these methods are roughly asymptotically equivalent (see for example [5] and [6]), and despite the fact that the use of the bootstrap is not an irrefutable question, it seems that using the bootstrap can be advantageous in many “real world” G. Simon is funded by the Belgian F.R.I.A. M. Verleysen is Senior Research Associate of the Belgian F.N.R.S. The work of A. Lendasse is supported by the Interuniversity Attraction Poles (IAP), initiated by the Belgian Federal State, Ministry of Sciences, Technologies and Culture. The scientific responsibility rests with the authors. J. Mira and J.R. ´ Alvarez (Eds.): IWANN 2003, LNCS 2686, pp. 182-189, 2003. c Springer-Verlag Berlin Heidelberg 2003