Uncertainty estimation for multivariate regression coefficients Nicolaas (Klaas) M. Faber Department of Production and Control Systems, ATO, PO Box 17, 6700 AA Wageningen, The Netherlands Received 18 June 2002; received in revised form 22 August 2002; accepted 27 August 2002 Abstract Five methods are compared for assessing the uncertainty in multivariate regression coefficients, namely, an approximate variance expression and four resampling methods (jack-knife, bootstrapping objects, bootstrapping residuals, and noise addition). The comparison is carried out for simulated as well as real near-infrared data. The calibration methods considered are ordinary least squares (simulated data), partial least squares regression, and principal component regression (real data). The results suggest that the approximate variance expression is a viable alternative to resampling. D 2002 Elsevier Science B.V. All rights reserved. Keywords: Multivariate calibration; Regression vector; Uncertainty estimation; Resampling; Jack-knife; Bootstrap; Monte Carlo simulation; OLS; PLSR; PCR; NIR 1. Introduction Typically, applications of multivariate models are concerned with the prediction of a property of interest. To achieve an acceptable predictive ability, the uncertainty in the model parameters, i.e., the regression coefficients, should not be too large. In keeping with this principle, Centner et al. [1] eliminated variables for which the regression coef- ficients carry a relatively large uncertainty. They used jack-knifing to estimate this uncertainty when partial least squares regression (PLSR) is used for calibration. In the chemometrics literature, two alternatives to the jack-knife have been proposed for assessing the uncertainty in multivariate regres- sion coefficients. Wehrens and Van der Linden [2] used the bootstrap in connection with principal component regression (PCR). By contrast, Faber and Kowalski [3] derived approximate variance expressions for PLSR and PCR (see their Sections 3.3.3, 3.3.4, and 3.3.5). These expressions account for all sources of measurement error and accom- modate for heteroskedastic as well as correlated noise. The jack-knife and bootstrap are resampling methods [4]. Briefly, resampling amounts to gen- erating new data sets from the available one by introducing an artificial perturbation. The desired uncertainty estimate follows from the spread in the results obtained for the new data sets. This approach essentially assumes that the artificial per- turbation mimics the effect of the real perturbation already present in the original data set. Another resampling method, which has received less atten- tion in the chemometrics literature, is the noise addition method. Excellent discussions of this method are available. Press et al. [5] treat con- 0169-7439/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. PII:S0169-7439(02)00102-8 E-mail address: n.m.faber@ato.wag-ur.nl (N.M. Faber). www.elsevier.com/locate/chemometrics Chemometrics and Intelligent Laboratory Systems 64 (2002) 169 – 179