Vol.:(0123456789) 1 3
European Food Research and Technology
https://doi.org/10.1007/s00217-020-03480-5
ORIGINAL PAPER
Characterization of Cabernet Sauvignon wines from California:
determination of origin based on ICP‑MS analysis and machine
learning techniques
Nattane Luíza da Costa
1,2
· Joao Paulo Bianchi Ximenez
3
· Jairo Lisboa Rodrigues
4
· Fernando Barbosa Jr
3
·
Rommel Barbosa
1
Received: 10 November 2019 / Revised: 1 March 2020 / Accepted: 9 March 2020
© Springer-Verlag GmbH Germany, part of Springer Nature 2020
Abstract
In this paper, samples of Cabernet Sauvignon wines produced in California have been analyzed on the basis of their elemental
content and classifed according to its geographical origin by the use of machine learning. Overall, 13 metals (Al, Cd, Co,
Cr, Cu, Li, Mn, Ni, P, Pb, Rb, Sr, and Zn) were determined by inductively coupled plasma mass spectrometry (ICP-MS). We
used two algorithms of variable selection in order to estimate the relevance of each metal to classifcation. Predictive models
based on chemometric tools and machine learning algorithms were developed to diferentiate origin of wine samples. Li and
Sr were identifed as the main responsible for the diferentiation of samples. The application of Random Forest permitted
to correctly classify all samples. A second analysis was performed by removing the variables Li and Sr to investigate the
relevance of the others metals. We found that a group of seven variables (Cd, Ni, Mn, Pb, Rb, Co, Cu) which were able to
discriminate the wines in 89% of accuracy by using Support Vector Machines. Results suggested that the developed meth-
odology by advanced machine learning techniques is robust and reliable for the geographical classifcation of wine samples,
and the study of the elements that characterize the regions.
Keywords Wine classifcation · Feature selection · Machine learning · Support vector machines · Elemental content
Introduction
The fngerprinting of the content of trace metals in wines is
a valuable method to authenticate the geographical origin of
the same. The presence and concentration of metals in soil
on which vines were grown enables their use to characterize
the wines, i.e., the elements move from rock to soil and from
soil to grape [1]. In particular, the wine authenticity has been
extensively investigated because this beverage is an easily
adulterated product and there exists an interest of consum-
ers in foods strongly identifed with a place of origin [1, 2].
The world wine production reached in 2018 a volume of
292.3 million of hectoliters [3]. California, the geographi-
cal origin of the wines analyzed in this study, is a world-
renowned state for the ability to produce world class quality
wine. Napa is a premier wine producing region producing a
higher quality wines over the rest of California [4]. In this
context, the authenticity of wines from California winery
regions is an important issue. The multivariate data analysis
and machine learning techniques are powerful tools to con-
duct quality control and wine authentication that have been
used to discriminate wines from all around the world [1].
The Cabernet Sauvignon is by far the most important
varietal for achieving high wine prices in California [4]. In
spite of that, there are few researches that classifed Cali-
fornia wines produced with this grape variety. Californian
wines made of grapes in diferent maturation states were
classifed by Umali et al. based on tannin content [5], and
Hopfer and coworkers [6] classifed the intraregional origin
* Rommel Barbosa
rommel@inf.ufg.br
1
Instituto de Informática, Universidade Federal de Goiás,
Alameda Palmeiras, Quadra D, Câmpus Samambaia,
Goiânia, GO 74690-900, Brazil
2
Núcleo de Informática, Instituto Federal Goiano, Urutaí, GO,
Brazil
3
Faculdade de Ciências Farmacêuticas de Ribeirão Preto,
Universidade de São Paulo, Ribeirão Preto, SP, Brazil
4
Universidade Federal Dos Vales Do Jequitioha E Mucuri,
Teoflo Otoni, MG, Brazil