Analytica Chimica Acta 552 (2005) 13–24
Classification of olive oils using high throughput flow
1
H NMR
fingerprinting with principal component analysis, linear discriminant
analysis and probabilistic neural networks
Serge Rezzi
a
, David E. Axelson
b
, K´ aroly H´ eberger
a,c,∗
,
Fabiano Reniero
a
, Carlo Mariani
d
, Claude Guillou
a
a
European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Physical and Chemical Exposure Unit,
BEVABS T.P. 740, I-21020Ispra (VA), Italy
b
MRi Consulting, 8 Wilmot St., Kingston, Ont., Canada K7L 4V1
c
Chemical Research Center, Hungarian Academy of Sciences, Institute of Chemistry, H-1525 Budapest, P.O. Box 17, Hungary
d
Stazione Sperimentale Olii e Grassi, Milano, Italy
Received 27 May 2005; received in revised form 24 June 2005; accepted 19 July 2005
Available online 31 August 2005
Abstract
The combination of
1
H NMR fingerprinting with multivariate analysis provides an original approach to study the profile of olive oil in
relation to its geographical origin and processing. The present work aims at illustrating the relevance of
1
H NMR fingerprints for assessing
the geographical origin and the year of production for olive oils from various Mediterranean areas. Multivariate (chemometric) techniques
are able to filter out the most relevant information from a spectrum, e.g. for a classification. Principal component analysis (PCA) was carried
out on the ∼12,000 variables (chemical shifts) and four data sets were defined prior to PCA. Linear discriminant analysis (LDA) of the first
50 PC’s was applied for classification of olive oil samples (97 or 91) according to the geographic origin and year of production. The data
analysis has been carried out with and without outliers, as well. Variable selection for LDA was achieved using: (i) the best five variables and
(ii) an interactive forward stepwise manner. Using LDA on the external validation sets the correct classification varied between 47 and 75%
(random selection), and between 35 and 92% (Kennard–Stone selection (KS)) depending on geographic origin (country) and production years.
A similar success rate could be achieved using partial least squares discriminant analysis (PLS DA). The success rate can be considerably
improved by using probabilistic neural networks (PNN). Correct classification by PNN varied between 58 and 100% on the external validation
sets. Other chemometric techniques, such as multiple linear regression, or generalized pair-wise correlation, did not give better results.
© 2005 Elsevier B.V. All rights reserved.
Keywords: NMR; Authenticity; Multivariate methods; Linear discriminant analysis; Principal component analysis; Chemometrics; Artificial neural networks
1. Introduction
Olive oil is a very important agricultural product for most
of the countries of the Mediterranean basin. According to
recent estimations on olive oil markets, the European union
(EU) produces 78% of the world production followed by
The paper was originally presented at the 9th International Conference
on Chemometrics in Analytical Chemistry, 20–23 September 2004, Lisbon.
∗
Corresponding author. Tel.: +36 1 438 11 03; fax: +36 1 438 11 43.
E-mail address: heberger@chemres.hu (K. H´ eberger).
Turkey (6%), Syria (6%), Tunisia (3%) and Morocco (2%).
The EU also dominates world consumption (73%) while
the rest of the production is absorbed by USA (8%), Japan,
Canada and Australia (1% for each). Within the EU, Spain,
Italy and Greece are mainly responsible for olive oil pro-
duction with around 865, 590 and 375 thousands of tons
reached in 2003, respectively. The high added value of olive
oil makes its control an important goal for EU producers
and consumers. There is thus a need in developing reliable
analytical methods to ensure compliance with labeling, i.e.
the control of geographical origin giving also support to the
0003-2670/$ – see front matter © 2005 Elsevier B.V. All rights reserved.
doi:10.1016/j.aca.2005.07.057