1 Neural Computation, vol. 14, pp. 1481-1506 LOCAL OVERFITTING CONTROL VIA LEVERAGES Gaétan MONARI* , **, Gérard DREYFUS* *École Supérieure de Physique et de Chimie Industrielles de la Ville de Paris Laboratoire d'Électronique 10, rue Vauquelin - F 75005 PARIS - FRANCE **USINOR DSI/DISA SOLLAC FOS bat. LB1 F 13776 FOS-sur-Mer Cedex - FRANCE ABSTRACT We present a novel approach to dealing with overfitting in black-box models. It is based on the leverages of the samples, i.e. on the influence that each observation has on the parameters of the model. Since overfitting is the consequence of the model specializing on specific data points during training, we present a selection method for nonlinear models, which is based on the estimation of leverages and confidence intervals. It allows both the selection among various models of equivalent complexities corresponding to different minima of the cost function (e.g. neural nets with the same number of hidden units), and the selection among models having different complexities (e.g. neural nets with different numbers of hidden units). A complete model selection methodology is derived. 1. INTRODUCTION The traditional view of overfitting refers mostly to the bias / variance tradeoff, introduced in (Geman & al., 1992): a family of parameterized functions with too few parameters, with respect to the complexity of a problem, is said to have too large a bias, because it cannot fit the deterministic model underlying the data. Conversely, when the model is over- parameterized, the dependence of the resulting functions on the particular training set is too large, and so is the variance of the corresponding family of parameterized functions. Therefore, overfitting is usually detected by the fact that the modeling error on a test set is much larger than the modeling error on the training data. In practice, there are two major ways of preventing overfitting: a priori, by limiting the variance of the considered family of parameterized functions. These regularization methods include weight decay (see (MacKay, 1992) for a bayesian