Analytica Chimica Acta 552 (2005) 13–24 Classification of olive oils using high throughput flow 1 H NMR fingerprinting with principal component analysis, linear discriminant analysis and probabilistic neural networks Serge Rezzi a , David E. Axelson b , K´ aroly H´ eberger a,c, , Fabiano Reniero a , Carlo Mariani d , Claude Guillou a a European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Physical and Chemical Exposure Unit, BEVABS T.P. 740, I-21020Ispra (VA), Italy b MRi Consulting, 8 Wilmot St., Kingston, Ont., Canada K7L 4V1 c Chemical Research Center, Hungarian Academy of Sciences, Institute of Chemistry, H-1525 Budapest, P.O. Box 17, Hungary d Stazione Sperimentale Olii e Grassi, Milano, Italy Received 27 May 2005; received in revised form 24 June 2005; accepted 19 July 2005 Available online 31 August 2005 Abstract The combination of 1 H NMR fingerprinting with multivariate analysis provides an original approach to study the profile of olive oil in relation to its geographical origin and processing. The present work aims at illustrating the relevance of 1 H NMR fingerprints for assessing the geographical origin and the year of production for olive oils from various Mediterranean areas. Multivariate (chemometric) techniques are able to filter out the most relevant information from a spectrum, e.g. for a classification. Principal component analysis (PCA) was carried out on the 12,000 variables (chemical shifts) and four data sets were defined prior to PCA. Linear discriminant analysis (LDA) of the first 50 PC’s was applied for classification of olive oil samples (97 or 91) according to the geographic origin and year of production. The data analysis has been carried out with and without outliers, as well. Variable selection for LDA was achieved using: (i) the best five variables and (ii) an interactive forward stepwise manner. Using LDA on the external validation sets the correct classification varied between 47 and 75% (random selection), and between 35 and 92% (Kennard–Stone selection (KS)) depending on geographic origin (country) and production years. A similar success rate could be achieved using partial least squares discriminant analysis (PLS DA). The success rate can be considerably improved by using probabilistic neural networks (PNN). Correct classification by PNN varied between 58 and 100% on the external validation sets. Other chemometric techniques, such as multiple linear regression, or generalized pair-wise correlation, did not give better results. © 2005 Elsevier B.V. All rights reserved. Keywords: NMR; Authenticity; Multivariate methods; Linear discriminant analysis; Principal component analysis; Chemometrics; Artificial neural networks 1. Introduction Olive oil is a very important agricultural product for most of the countries of the Mediterranean basin. According to recent estimations on olive oil markets, the European union (EU) produces 78% of the world production followed by The paper was originally presented at the 9th International Conference on Chemometrics in Analytical Chemistry, 20–23 September 2004, Lisbon. Corresponding author. Tel.: +36 1 438 11 03; fax: +36 1 438 11 43. E-mail address: heberger@chemres.hu (K. H´ eberger). Turkey (6%), Syria (6%), Tunisia (3%) and Morocco (2%). The EU also dominates world consumption (73%) while the rest of the production is absorbed by USA (8%), Japan, Canada and Australia (1% for each). Within the EU, Spain, Italy and Greece are mainly responsible for olive oil pro- duction with around 865, 590 and 375 thousands of tons reached in 2003, respectively. The high added value of olive oil makes its control an important goal for EU producers and consumers. There is thus a need in developing reliable analytical methods to ensure compliance with labeling, i.e. the control of geographical origin giving also support to the 0003-2670/$ – see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.aca.2005.07.057