Review Article From magnetic resonance spectroscopy to classi®cation of tumors. A review of pattern recognition methods Gisela Hagberg* Karolinska MR-Research Center, S-171 76 Stockholm and Uppsala University PET-center, UAS, S-751 85 Uppsala, Sweden Received 28 July 1997; revised 10 November 1997; accepted 1 December 1997 ABSTRACT: This article reviews the wealth of different pattern recognition methods that have been used for magnetic resonance spectroscopy (MRS) based tumor classification. The methods have in common that the entire MR spectra is used to develop linear and non-linear classifiers. The following issues are adressed: (i) pre-processing, such as normalization and digitization, (ii) extraction of relevant spectral features by multivariate methods, such as principal component analysis, linear discriminant analysis (LDA), and optimal discriminant vector, and (iii) classification by LDA, cluster analysis and artificial neural networks. Different approaches are compared and discussed in view of practical and theoretical considerations. 1998 John Wiley & Sons, Ltd. KEYWORDS: magnetic resonance spectroscopy; neoplasms; classification; pattern recognition; review INTRODUCTION Magnetic resonance spectroscopy (MRS) has been used extensively to investigate tumors—either in vivo or by in vitro analysis of tissue extracts—and their systemic effects, via body-fluid analysis. Since the first attempt at discriminating tumor patients from healthy volunteers by different relaxation times of plasma lipids, 1 several groups have investigated alterations of single resonances with the aim of classifying human tumors (for reviews, see Refs 2 and 3). The common finding in most studies, independent of the nucleus and the experimental para- meters used, has been that the intensities of almost all resonances are altered with respect to normal tissue. Only in some studies, have single resonances or ratios of resonance ranges been reported to have a discriminatory power. 2–7,14 In parallel, reports about patterns of spectral changes associated with given tumor types, 8–15 cell- types 16–18 and disease states 19 have been published. To permit the analysis of tumor specific MRS patterns and their use for tumor classification, several methods, denoted pattern recognition (PR) methods, are available. Given a set of MR spectra from different tumor types, PR methods can be applied to the entire set of data to detect the group structure; this is called unsupervised learning. The group membership of each case can then be compared with the known histological tumor assignment to check the validity of the PR method. Examples of PR methods of this kind are cluster analysis and self- organizing artificial neural networks (ANN). An alternate way is to subdivide the data set into one training and one test set, a technique called supervised learning. The known histological assignment of the training set is used to develop a classifier. This can either be obtained by applying the PR method directly or by selecting spectral features that have a high potency to separate the different tumor types. Examples of PR methods that are used to select (sometimes denoted extract) spectral features are principal component analysis (PCA), linear discriminant analysis (LDA), and optimum discriminant vector (ODV) analysis. The classifier developed using the training set is applied to the test set and the validity of the classifier can be checked by comparison with the known histological assignment of the test set. Since PR methods may prove useful in the biomedical sciences, not the least for computerized medical diag- nostic expert systems, it is probable that existing methods will be refined and that new methods will emerge in the future. NMR IN BIOMEDICINE NMR Biomed 11, 148–156 (1998) 1998 John Wiley & Sons, Ltd. CCC 0952–3480/98/040148–09 $17.50 *Correspondence to: G. Hagberg, Uppsala University PET-center, UAS, S-751 85 Uppsala, Sweden. Abbreviations used: Ala, Alanine; ANN, artificial neural networks; Cho, choline containing compounds; FID, free induction decay; Gly, Glycine; LDA, linear discriminant analysis; LOO, leave-one-out; MR, magnetic resonance; MRS, magnetic resonance spectroscopy; NMR, nuclear magnetic resonance; ODV, optimum discriminant vector; OFS, optimum feature selector; PC, principal component; PCA, principal component analysis; PCr/Cr, phosphocreatine/creatine; PR, pattern recognition; WT, wavelet transform.