Class modeling techniques in the control of the geographical origin of wines Michele Forina a, ⁎, Paolo Oliveri a , Henry Jäger b , Ute Römisch b , Johanna Smeyers-Verbeke c a Dipartimento di Chimica e Tecnologie Farmaceutiche ed Alimentari, Via Brigata Salerno 13, I-16147, Genova, Italy b Technische Universität Berlin, Fak. III, Gustav-Meyer-Allee 25, D-13355 Berlin, Germany c Vrije Universiteit Brussel, Farmaceutisch Instituut , Laarbeeklaan 103, B-1090 Brussels, Belgium abstract article info Article history: Received 18 June 2009 Received in revised form 17 August 2009 Accepted 21 August 2009 Available online 29 August 2009 Keywords: Class modeling SIMCA UNEQ MRM Wine Speciﬁcity Sensitivity Wine samples of four different countries: Hungary, Czech Republic, Romania and South Africa, have been studied within the European project WINES-DB “establishing of a wine data bank for analytical parameters from third countries”. For each country two types of wine samples were collected, during three consecutive years: com- mercial wines and wines obtained by microviniﬁcation according to EC regulation N. 2729/2000. The sampling design was organized to represent both the grape varieties and the ofﬁcial wine regions in the four countries. The 1188 wine samples were analyzed for 58 chemical quantities. Data analysis was performed with special attention to the real problem, namely the control of frauds. Class modeling techniques (UNEQ, SIMCA, MRM) have been applied, to answer to the general question: “Does sample O, stated of class A, really belong to class A?”. Two validation strategies, based on cross validation and on an external, representative, evaluation set, have been used to evaluate carefully the predictive performance of the class models. The results obtained with the four class modeling techniques indicate that for the four countries it is possible to compute models with high efﬁciency, generally with a reduced number of variables. To obtain efﬁcient models, red and white wines, commercial and microviniﬁcation wines, must be considered separately. The validity of the models is ensured by the representativity of the samples, the appropriate application of techniques of Chemometrics and the validation. © 2009 Elsevier B.V. All rights reserved. 1. Introduction This study was performed within the European project WINE-DB, “Establishing of a wine data bank for analytical parameters for wines from third countries”. The European Ofﬁce for Wine, Alcohol and Spirit Drinks aims to ensure correct implementation of EU wine quality legis- lation and was set up to combat frauds in this area. To reach its ob- jectives, the Ofﬁce created the European Wine Databank of authentic European wines, where every year some analytical data (stable iso- topes) of more than a thousand authentic samples are added. Because of the enlargement of the European Union and the need to extract useful information from the analytical wine data, the project WINE-DB was planned with the objective of: a) collecting analytical data from countries not represented in the European Wine Databank; b) eval- uating the utility of the analytical data (other than stable isotopes) in the control of the geographical origin; c) comparing the composition of authentic and commercial wines. The “authentic” wines are those ob- tained by microviniﬁcation according to EC regulation N. 2729/2000. The “commercial” wines studied in the project were collected by experts of national wine control organizations, and both their geographical origin and the variety of grapes used are sure. Moreover the commercial wines were of high quality, from well known producers. The commercial samples include all anthropogenic inﬂuences, so that the differences are in the viniﬁcation procedure and in the storage time and conditions. Control of the geographical origin means that a mathematical model, built with the measured variables, must be used to verify the trueness of the origin declared on the bottle. The chemometric techniques that built such models are the class modeling techniques (CMT). They answer the question: “Does sample O, stated of class A, really belong to class A?”. On the contrary, the classiﬁcation techniques assign objects to one of the classes speciﬁed in the problem. The difference is not trivial, because, in practice, classiﬁcation means to assign an origin to a sample without label, what is rather rare. However classiﬁcation techniques, especially linear discriminant analysis (LDA), are very frequently used in studies on wines and other typical foods, both because the related software can be found easily and because the many examples available with infor- mative plots made LDA very popular in food science. On the other hand, class modeling techniques have been used rarely, and with more at- tention to their classiﬁcation ability than to their modeling perfor- mances. However, some recent papers [1–4] indicate an increased attention to the development and improvement of class modeling techniques and their use in food control problems. Chemometrics and Intelligent Laboratory Systems 99 (2009) 127–137 ⁎ Corresponding author. Tel.: +39 010 3532630; fax: +39 010 3532684. E-mail address: forina@dicfta.unige.it (M. Forina). 0169-7439/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2009.08.002 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab