Authenticity assessment and protection of high-quality Nebbiolo-based Italian wines through machine learning Luigi Portinale a, * , Giorgio Leonardi a , Marco Arlorio b , Jean Daniel Coïsson b , Fabiano Travaglia b , Monica Locatelli b a Computer Science Institute, Dipartimento di Scienze e Innovazione Tecnologica, University of Piemonte Orientale, Viale Teresa Michel 11, 15121, Alessandria, Italy b Dipartimento di Scienze del Farmaco and Drug and Food Biotechnology Center, University of Piemonte Orientale A. Avogadro, Largo Donegani 2, 28100, Novara, NO, Italy ARTICLE INFO Keywords: Wine authentication Machine learning Multi-class classication Support vector machine Bayesian network classier Multi-layer perceptron ABSTRACT This paper discusses an intelligent data analysis approach, based on machine learning techniques, and aimed at the denition of methods for chemical data analysis assessment of the authenticity and protection, against fake versions, of some of the highest value Nebbiolo-based wines from Piedmont (Italy). This is an important and very relevant issue in the wine market, where commercial frauds related to such a kind of products are estimated to be worth millions of Euros. The objective is twofold: to show that the problem can be addressed without expensive and hyper-specialized wine chemical analyses, and to demonstrate the actual usefulness of classication algo- rithms for data mining and machine learning on the resulting chemical proles. Following Wagstaff's proposal for practical exploitation of machine learning approaches, we describe how data have been collected and prepared for the production of different datasets, how suitable classication models have been identied and how the inter- pretation of the results suggests the emergence of an active role of machine learning classication techniques, based on standard chemical proling, for the assesment of the authenticity of the wines target of the study. Experiments have been performed with both datasets of real samples and with syntethic datasets which have been articially generated from real data. 1. Introduction The quality and safety proles of ne wines represent a peculiar case of the notion of food integrity, because of the very high value of a single bottle, and because of the complex chemical prole, requiring therefore specic and robust methods for their univocal proling/authentication. Vitis vinifera is the unique grape allowed for the winemaking, but many different genetic varieties (e.g. Pinot, Nebbiolo, Merlot, Sangiovese, Sirah and many others) lead to wines with different character and chemical proles. The industrial processing largely build the wine specicity. Moreover, the terroir(the set of special characteristics that the geog- raphy, the geology and the microclimate of a certain region or peculiar location, interacting with grape genetics, express in wine), while bringing to the diversication of the product, complicates signicantly the metabolomic prole of wine and, thus, the process of traceability and identication. Although specic regulations exist in this matter, and some analytical approaches and protocols are well established for wine tracking and authentication, quality wines are highly subjected to adulteration. Wine fraud is then a big issue worldwide, inducing signicant problems for consumers; it also triggers destabilization of the wine market, particu- larly regarding the quality aspect, with an estimated impact of about 7% of the whole market value. A frequent type of counterfeiting in wine sector, is mislabeling, regarding both the used cultivar of grape and the geographical area of origin [1]; it causes an economical impact estimated to be several million of Euros. The detection of adulterations or declarations which do not corre- spond to the labeling are actually ofcial tasks of wine quality control and consumer protection. During the last years, analytical methods have been improved in this eld. Some of them (stable isotope ratio analysis by nuclear magnetic resonance, and isotope ratio mass spectrometry) have been adopted as ofcial methods by the European Community (EC) * Corresponding author. E-mail addresses: luigi.portinale@uniupo.it (L. Portinale), giorgio.leonardi@uniupo.it (G. Leonardi), marco.arlorio@uniupo.it (M. Arlorio), jeandaniel.coisson@uniupo.it (J.D. Coïsson), fabiano.travaglia@uniupo.it (F. Travaglia), monica.locatelli@uniupo.it (M. Locatelli). Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics https://doi.org/10.1016/j.chemolab.2017.10.012 Received 25 May 2017; Received in revised form 18 September 2017; Accepted 23 October 2017 Available online 31 October 2017 0169-7439/© 2017 Elsevier B.V. All rights reserved. Chemometrics and Intelligent Laboratory Systems 171 (2017) 182197