Independent components analysis to increase efﬁciency of discriminant analysis methods (FDA and LDA): Application to NMR ﬁngerprinting of wine Yulia B. Monakhova a,b,n , Rolf Godelmann c , Thomas Kuballa c , Svetlana P. Mushtakova b , Douglas N. Rutledge d a Spectral Service AG, Emil-Hoffmann-Straße 33, 50996 Cologne, Germany b Institute of Chemistry, Saratov State University, Astrakhanskaya Street 83, 410012 Saratov, Russia c Chemisches und Veterinäruntersuchungsamt (CVUA) Karlsruhe, Weissenburger Strasse 3, 76187 Karlsruhe, Germany d AgroParisTech, UMR 1145, Ingénierie Procédés Aliments,16 rue Claude Bernard, F-75005 Paris, France article info Article history: Received 12 January 2015 Received in revised form 17 March 2015 Accepted 22 March 2015 Available online 27 March 2015 Keywords: Principle component analysis Independent component analysis Discriminant analysis 1 H NMR spectroscopy Wine abstract Discriminant analysis (DA) methods, such as linear discriminant analysis (LDA) or factorial discriminant analysis (FDA), are well-known chemometric approaches for solving classiﬁcation problems in chemistry. In most applications, principle components analysis (PCA) is used as the ﬁrst step to generate orthogonal eigenvectors and the corresponding sample scores are utilized to generate discriminant features for the discrimination. Independent components analysis (ICA) based on the minimization of mutual information can be used as an alternative to PCA as a preprocessing tool for LDA and FDA classiﬁcation. To illustrate the performance of this ICA/DA methodology, four representative nuclear magnetic resonance (NMR) data sets of wine samples were used. The classiﬁcation was performed regarding grape variety, year of vintage and geographical origin. The average increase for ICA/DA in comparison with PCA/DA in the percentage of correct classiﬁcation varied between 6 71% and 8 72%. The maximum increase in classiﬁcation efﬁ- ciency of 11 72% was observed for discrimination of the year of vintage (ICA/FDA) and geographical origin (ICA/LDA). The procedure to determine the number of extracted features (PCs, ICs) for the opti- mum DA models was discussed. The use of independent components (ICs) instead of principle components (PCs) resulted in improved classiﬁcation performance of DA methods. The ICA/LDA method is preferable to ICA/FDA for recognition tasks based on NMR spectroscopic measurements. & 2015 Elsevier B.V. All rights reserved. 1. Introduction Nowadays modern analytical instruments produce great amounts of information (variables or features) for a large number of samples (objects) that can be analyzed in a relatively short time. This leads to the availability of multivariate data matrices that require the use of mathematical and statistical procedures in order to efﬁciently extract the maximum of useful information from the data [1]. Among pattern recognition techniques, discriminant analysis (DA) methods are widely used, especially in the ﬁeld of food analysis, to obtain a comprehensive, multivariate description of the data without assigning particular signals to speciﬁc metabolites [1–3]. The basic idea of DA methods is that the knowledge of the category of samples in a training set makes it possible to develop a classiﬁcation model applicable to unknown samples [1–3]. Factorial discriminant analysis (FDA) and linear discriminant analysis (LDA) are among two of the most popular and successful classiﬁcation methods [1–3]. LDA is based on the determination of linear discriminant func- tions, which simultaneously maximizes the ratio of between-class variance and minimizes the within-class variance by applying a generalized eigen-decomposition [4]. In LDA, classes are assumed to follow a multivariate normal distribution and be linearly separable. Each latent variable obtained in LDA is a linear combination of the original variables. This function is called a canonical variate. For k classes, k À 1 canonical variates can be determined if the number of variables is larger than k [1,5]. LDA requires that the variance–cov- ariance matrices of the predeﬁned classes established can be pooled. This is only possible when these matrices can be considered to be equivalent, which means that their 95% conﬁdence ellipsoids have Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/talanta Talanta http://dx.doi.org/10.1016/j.talanta.2015.03.037 0039-9140/& 2015 Elsevier B.V. All rights reserved. n Corresponding author at: Spectral Service AG, Emil-Hoffmann-Straße 33, 50996 Cologne, Germany. Tel.: þ49 2236 9694729; fax: þ49 2236 9694711. E-mail address: yul-monakhova@mail.ru (Y.B. Monakhova). Talanta 141 (2015) 60–65