Ž . Chemometrics and Intelligent Laboratory Systems 42 1998 209–220 PLS regression on wavelet compressed NIR spectra Johan Trygg ) , Svante Wold Research Group for Chemometrics, Department of Organic Chemistry, Umea UniÕersity, Umea S-901 87, Sweden ˚ ˚ Abstract Today, good compression methods are more and more needed, due to the ever increasing amount of data that is being collected. The mere thought of the computational power demanded to calculate a regression model on a large data set with many thousands of variables can often be depressing. This paper should be treated as an introduction to how the discrete wavelet transform can be used in multivariate calibration. It will be shown that by using the fast wavelet transform on indi- Ž . vidual signals as a preprocessing method in regression modelling on near-infrared NIR spectra, good compression is achieved with almost no loss of information. No loss of information means that the predictive ability and the diagnostics, together with the graphical displays of the data compressed regression model, are basically the same as for the original un- compressed regression model. The regression method used here is Partial Least Squares, PLS. In a NIR-VIS example, com- pression of the data set to 3% of its original size was achieved. q 1998 Elsevier Science B.V. All rights reserved. Keywords: Discrete wavelet transform; Partial least squares projections to latent structures; Data compression; NIR spectroscopy; Pre- processing techniques 1. Introduction The idea of representing a signal as the sum of analyzing functions dates back to the days when Joseph Fourier presented his theories on the Fourier transform in 1807. Wavelet transformation is no dif- ferent, it is a linear transformation and its trademarks are good compression and denoising of complicated signals and images. Wavelets look like small oscillat- ing waves, and they have the ability to analyze a sig- nal according to scale, i.e., inverse frequency. The size of the analyzing window in wavelet transform varies with different scales, and it is this small but still ) Corresponding author. Fax: q46-90-13-88-85; e-mail: jtg@chem.umu.se. very important property, along with the fact that wavelet functions are local in both time and fre- quency that makes the wavelet transform versatile and useful. The analyzing mother wavelet used in this paper is the popular Daubechies-4 wavelet function, which forms an orthogonal set of basis functions. Wavelet transformation is becoming increasingly more popu- w x lar in other fields 1,2 , and lately, there has been a growing number of papers from the chemical society w x 3,4 , where it has been used as a feature extraction tool and for removal of noise. Good tutorials on the w x wavelet transform have also been given 5,6,24 . Bos wx and Vrielink 7 have reported the use of the wavelet Ž . transformation in the classification of infrared IR spectra. Alsberg has pointed out many applications 0169-7439r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved. Ž . PII: S0169-7439 98 00013-6