4 th International Conference on Inverse Problems in Engineering Rio de Janeiro, Brazil, 2002 Selection of Multiple Regularization Parameters in Local Ridge Regression Using Evolutionary Algorithms and Prediction Risk Optimization J. Wesley Hines, Andrei V. Gribok, Aleksey M. Urmanov Mark A. Buckner Department of Nuclear Engineering Engineering Science and Technology Division The University of Tennessee Oak Ridge National Laboratory Knoxville, TN USA Oak Ridge, TN USA jhines2@utk.edu, agribok@utk.edu, urmanov@utk.edu buk@ornl.gov ABSTRACT This paper presents a new methodology for regularizing data-based predictive models. Traditional modeling using regression can produce unrepeatable, unstable, or noisy predictions when the inputs are highly correlated. Ridge regression is a regularization technique used to deal with those problems. A drawback of ridge regression is that it optimizes a single regularization parameter while the methodology presented in this paper optimizes several local regularization parameters that operate independently on each component. This method allows components with significant predictive power to be passed while components with low predictive power are damped. The optimal combination of regularization parameters are computed using an Evolutionary Strategy search technique with the objective function being a predictive error estimate. Examples are presented to demonstrate the advantages of this technique. NOMENCLATURE m n R X × ∈ matrix of predictor variables y response variable m R b ∈ vector of regression coefficients 2 σ noise variance ( ) T i V s diag U ⋅ ⋅ SVD of X 2 λ ridge parameter i λ local ridge parameters INTRODUCTION In many predictive modeling engineering applications, the predictor data set is collinear. For some systems, such as predictive systems used to monitor process sensor calibrations, collinear predictors are necessary for building successful and robust inferential models [1]. Due to the presence of collinearity, traditional empirical modeling techniques such as ordinary least squares, neural network multi-layer perceptrons, and others that do not employ regularization produce very unstable and unrepeatable results [2]. Examples exist in most research fields. To deal with instabilities due to collinear inputs, the method of regularization developed first by Tikhonov [3] was adopted in the form of ridge regression [4] or a more general class of penalized estimators [5]. When applying ordinary least squares (OLS) to a data set with collinear inputs, the coefficients are usually very large in magnitude. These large coefficients are caused by overfitting the training data and can amplify noise in the predictors and produce useless predictions. This problem can be avoided by adding additional constraints to the usual sum of squared error objective function. The most common method, termed ridge regression, adds a term that also minimizes the magnitude of the regression coefficients. In his paper, Hoerl [4] proved that regardless of the conditioning, for finite data sets, there always exists a ridge estimate that decreases the mean squared error of the solution. This means that even if the data matrix is not badly ill- conditioned, one can still improve prediction accuracy by exploiting ridge regression rather than OLS. Adding the constraint will bias the estimate but reduce its variance making it more stable so that the probability that the ridge estimates falls in a certain vicinity of the true parameter value is higher than that of the OLS estimate.