Identiﬁcation of signiﬁcant factors by an extension of ANOVA–PCA based on multi-block analysis D. Jouan-Rimbaud Bouveresse a,b, ⁎, R. Climaco Pinto b,c , L.M. Schmidtke d , N. Locquet a,b , D.N. Rutledge a,b a INRA, UMR 1145 Ingénierie Procédés Aliments, F-75005, Paris, France b AgroParisTech, UMR 1145 Ingénierie Procédés Aliments, F-75005 Paris, France c Computational Life Science Cluster (CliC), KBC, UmeåUniversity, S-90187, Umeå, Sweden d National Wine and Grape Industry Center, School of Agriculture and Wine Sciences, Charles Sturt University, Wagga Wagga, NSW 2650, Australia abstract article info Article history: Received 14 December 2009 Received in revised form 10 May 2010 Accepted 13 May 2010 Available online 25 May 2010 Keywords: Multi-block analysis Common Component and Speciﬁc Weights Analysis ComDim ANOVA–PCA F-test A modiﬁcation of the ANOVA–PCA method, proposed by Harrington et al. to identify signiﬁcant factors and interactions in an experimental design, is presented in this article. The modiﬁed method uses the idea of multiple table analysis, and looks for the common dimensions underlying the different data tables, or data blocks, generated by the “ANOVA-step” of the ANOVA–PCA method, in order to identify the signiﬁcant factors. In this paper, the “Common Component and Speciﬁc Weights Analysis” method is used to analyse the calculated multi-block data set. This new method, called AComDim, was compared to the standard ANOVA–PCA method, by analysing four real data sets. Parameters computed during the AComDim procedure enable the computation of F-values to check whether the variability of each original data block is signiﬁcantly greater than that of the noise. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Several multi-block analysis procedures exist for the simultaneous study of multiple sets of matrices with different variables describing the same samples (for example, see [1–4]). These methods may be useful in chemometrics to combine information about the same set of samples contained in signals acquired using different techniques (IR spectroscopy; Raman spectroscopy; physico–chemical analyses; etc.). One such multi-block technique is “Common Component and Speciﬁc Weights Analysis”—CCSWA [5]. The objective of multi-block analysis methods is to describe p data blocks observed for the same n samples (i.e. a set of p data matrices (X i , i = 1 to p) each with n rows, but not necessarily the same number of variables). The method consists in determining a common space for all p data blocks, with each matrix having a speciﬁc contribution (“salience”) to the deﬁnition of each dimension of this common space. This is done by ﬁnding the directions describing common distribu- tions of the samples in the spaces deﬁned by the different data blocks (hence the name Common Component, abbreviated CC or Common Dimension, abbreviated CD). Salience indicates the importance of each block in the construction of the common dimension, and a “percentage of variability extracted” by each dimension can be computed. The particular implementation of CCSWA used in this work, “ComDim”, was developed and coded in Matlab [6] by D. Bertrand [7]. The work presented in this article shows that an interesting extension of ComDim is to use it in the analysis of sets of blocks calcu- lated from a single initial data matrix. AComDim, presented here, is one such application, based on replacing the many separate PCAs performed in the ANOVA–PCA method [8], also abbreviated APCA, by a single analysis using ComDim. In this case, the various “Factor matrices” and “Interaction matrices” calculated from the initial data matrix are all analysed simultaneously, resulting in a series of “Common Components” along which the samples are distributed, each associated with a vector of “saliences” reﬂecting the importance of the contribution of each data block to the corresponding “Common Component”. After a brief presentation of both the ComDim and the APCA methods, this article will present several real case studies, showing the interest of this new method, particularly in comparison to the standard APCA method. 2. Theory 2.1. Notation Matrices will be denoted by bold uppercase letters (e.g., X), column vectors will be denoted by bold lowercase letters (e.g., u), and row vectors by bold lowercase letters followed by the uppercase Chemometrics and Intelligent Laboratory Systems 106 (2011) 173–182 ⁎ Corresponding author. INRA, UMR 1145 Ingénierie Procédés Aliments, F-75005, Paris, France. Tel.: +33 1 44 08 16 39. E-mail address: delphine.bouveresse@agroparistech.fr (D. Jouan-Rimbaud Bouveresse). 0169-7439/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2010.05.005 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab