Chemometrics and intelligent laboratorysystems Chemometrics and Intelligent Laboratory Systems 25 (1994) 313-323 ELSEVIER Comparing the predictive accuracy of models using a simple randomization test Hilko van der Voet Agriculhual Mathematics Group (GL W-DLO), P.O. Box 100,670O AC Wageningen, The Netherlands Received 23 March 1994; accepted 16 August 1994 zyxwvutsrqponmlkjihgfedcbaZYXWVUTSR Abstract A simple randomization t-test is proposed for testing the equality of performance of two prediction methods. The application of the test is shown to prevent unjustified conclusions about method superiority. Previous approaches to the problem of comparing predictive methods are discussed, and the proposed test is compared to other tests for paired data in a small simulation study. It is shown that the test can also be applied for classification problems where the predicted entity is qualitative rather than quantitative. 1. Introduction A primary purpose of a model is to predict certain traits of interest in the modelled system. This applies equally well to statistical models, e.g. a partial least squares model predicting moisture in cheese from near- infrared spectra, as to mechanisticmodels, e.g. complex dynamic crop models predicting maize yield in relation to climatic conditions. It is even true of the simplest model of all, the mean of a set of measurements. This is often interpreted, albeit implicitly, as the value to be expected for future observations under the same cir- cumstances. Accuracy of predictions is therefore a central theme when comparing different models for the same situa- tion. Such models may differ because they use different data for input, or they may use the same data but differ radically in model structure, or they may be just minor variations within the same model family, for example models with a different number of components in par- tial least squares regression. The predictive ability of any model can be judged from the distribution of prediction errors obtained when the model is used to predict the response of independent cases. One characteristic of this distribution, the mean squared error of prediction ( zyxwvutsrqponmlkjihgfedcbaZY MSEP) , is often used as a simple criterion for the predictive ability of a model. If y denotes the trait of interest to be predicted by the model, and 9 the prediction from the model, then MSEP is defined by MSEP=E(y-y^)* (I) where zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPO E denotes the expectation over a target popula- tion of individual cases. In this paper the situation is considered where a representative and independent sample of size n from this target population is available for evaluation, that is both reference values yi and pre- dictions yi are known for i = 1,. . .,n. MSEP is then esti- mated by njlSEP= (l/n) i(yi-$)’ (2) zyxwvutsr i=l 0169-7439/94/$07.00 0 1994 Ekevier Science B.V. All rights reserved SSDIO169-7439(94)00064-6