Vol.:(0123456789) 1 3 European Food Research and Technology https://doi.org/10.1007/s00217-020-03480-5 ORIGINAL PAPER Characterization of Cabernet Sauvignon wines from California: determination of origin based on ICP‑MS analysis and machine learning techniques Nattane Luíza da Costa 1,2  · Joao Paulo Bianchi Ximenez 3  · Jairo Lisboa Rodrigues 4  · Fernando Barbosa Jr 3  · Rommel Barbosa 1 Received: 10 November 2019 / Revised: 1 March 2020 / Accepted: 9 March 2020 © Springer-Verlag GmbH Germany, part of Springer Nature 2020 Abstract In this paper, samples of Cabernet Sauvignon wines produced in California have been analyzed on the basis of their elemental content and classifed according to its geographical origin by the use of machine learning. Overall, 13 metals (Al, Cd, Co, Cr, Cu, Li, Mn, Ni, P, Pb, Rb, Sr, and Zn) were determined by inductively coupled plasma mass spectrometry (ICP-MS). We used two algorithms of variable selection in order to estimate the relevance of each metal to classifcation. Predictive models based on chemometric tools and machine learning algorithms were developed to diferentiate origin of wine samples. Li and Sr were identifed as the main responsible for the diferentiation of samples. The application of Random Forest permitted to correctly classify all samples. A second analysis was performed by removing the variables Li and Sr to investigate the relevance of the others metals. We found that a group of seven variables (Cd, Ni, Mn, Pb, Rb, Co, Cu) which were able to discriminate the wines in 89% of accuracy by using Support Vector Machines. Results suggested that the developed meth- odology by advanced machine learning techniques is robust and reliable for the geographical classifcation of wine samples, and the study of the elements that characterize the regions. Keywords Wine classifcation · Feature selection · Machine learning · Support vector machines · Elemental content Introduction The fngerprinting of the content of trace metals in wines is a valuable method to authenticate the geographical origin of the same. The presence and concentration of metals in soil on which vines were grown enables their use to characterize the wines, i.e., the elements move from rock to soil and from soil to grape [1]. In particular, the wine authenticity has been extensively investigated because this beverage is an easily adulterated product and there exists an interest of consum- ers in foods strongly identifed with a place of origin [1, 2]. The world wine production reached in 2018 a volume of 292.3 million of hectoliters [3]. California, the geographi- cal origin of the wines analyzed in this study, is a world- renowned state for the ability to produce world class quality wine. Napa is a premier wine producing region producing a higher quality wines over the rest of California [4]. In this context, the authenticity of wines from California winery regions is an important issue. The multivariate data analysis and machine learning techniques are powerful tools to con- duct quality control and wine authentication that have been used to discriminate wines from all around the world [1]. The Cabernet Sauvignon is by far the most important varietal for achieving high wine prices in California [4]. In spite of that, there are few researches that classifed Cali- fornia wines produced with this grape variety. Californian wines made of grapes in diferent maturation states were classifed by Umali et al. based on tannin content [5], and Hopfer and coworkers [6] classifed the intraregional origin * Rommel Barbosa rommel@inf.ufg.br 1 Instituto de Informática, Universidade Federal de Goiás, Alameda Palmeiras, Quadra D, Câmpus Samambaia, Goiânia, GO 74690-900, Brazil 2 Núcleo de Informática, Instituto Federal Goiano, Urutaí, GO, Brazil 3 Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto, SP, Brazil 4 Universidade Federal Dos Vales Do Jequitioha E Mucuri, Teoflo Otoni, MG, Brazil