Identication of signicant factors by an extension of ANOVAPCA based on multi-block analysis D. Jouan-Rimbaud Bouveresse a,b, , R. Climaco Pinto b,c , L.M. Schmidtke d , N. Locquet a,b , D.N. Rutledge a,b a INRA, UMR 1145 Ingénierie Procédés Aliments, F-75005, Paris, France b AgroParisTech, UMR 1145 Ingénierie Procédés Aliments, F-75005 Paris, France c Computational Life Science Cluster (CliC), KBC, UmeåUniversity, S-90187, Umeå, Sweden d National Wine and Grape Industry Center, School of Agriculture and Wine Sciences, Charles Sturt University, Wagga Wagga, NSW 2650, Australia abstract article info Article history: Received 14 December 2009 Received in revised form 10 May 2010 Accepted 13 May 2010 Available online 25 May 2010 Keywords: Multi-block analysis Common Component and Specic Weights Analysis ComDim ANOVAPCA F-test A modication of the ANOVAPCA method, proposed by Harrington et al. to identify signicant factors and interactions in an experimental design, is presented in this article. The modied method uses the idea of multiple table analysis, and looks for the common dimensions underlying the different data tables, or data blocks, generated by the ANOVA-stepof the ANOVAPCA method, in order to identify the signicant factors. In this paper, the Common Component and Specic Weights Analysismethod is used to analyse the calculated multi-block data set. This new method, called AComDim, was compared to the standard ANOVAPCA method, by analysing four real data sets. Parameters computed during the AComDim procedure enable the computation of F-values to check whether the variability of each original data block is signicantly greater than that of the noise. © 2010 Elsevier B.V. All rights reserved. 1. Introduction Several multi-block analysis procedures exist for the simultaneous study of multiple sets of matrices with different variables describing the same samples (for example, see [14]). These methods may be useful in chemometrics to combine information about the same set of samples contained in signals acquired using different techniques (IR spectroscopy; Raman spectroscopy; physicochemical analyses; etc.). One such multi-block technique is Common Component and Specic Weights Analysis”—CCSWA [5]. The objective of multi-block analysis methods is to describe p data blocks observed for the same n samples (i.e. a set of p data matrices (X i , i = 1 to p) each with n rows, but not necessarily the same number of variables). The method consists in determining a common space for all p data blocks, with each matrix having a specic contribution (salience) to the denition of each dimension of this common space. This is done by nding the directions describing common distribu- tions of the samples in the spaces dened by the different data blocks (hence the name Common Component, abbreviated CC or Common Dimension, abbreviated CD). Salience indicates the importance of each block in the construction of the common dimension, and a percentage of variability extractedby each dimension can be computed. The particular implementation of CCSWA used in this work, ComDim, was developed and coded in Matlab [6] by D. Bertrand [7]. The work presented in this article shows that an interesting extension of ComDim is to use it in the analysis of sets of blocks calcu- lated from a single initial data matrix. AComDim, presented here, is one such application, based on replacing the many separate PCAs performed in the ANOVAPCA method [8], also abbreviated APCA, by a single analysis using ComDim. In this case, the various Factor matricesand Interaction matricescalculated from the initial data matrix are all analysed simultaneously, resulting in a series of Common Components along which the samples are distributed, each associated with a vector of saliencesreecting the importance of the contribution of each data block to the corresponding Common Component. After a brief presentation of both the ComDim and the APCA methods, this article will present several real case studies, showing the interest of this new method, particularly in comparison to the standard APCA method. 2. Theory 2.1. Notation Matrices will be denoted by bold uppercase letters (e.g., X), column vectors will be denoted by bold lowercase letters (e.g., u), and row vectors by bold lowercase letters followed by the uppercase Chemometrics and Intelligent Laboratory Systems 106 (2011) 173182 Corresponding author. INRA, UMR 1145 Ingénierie Procédés Aliments, F-75005, Paris, France. Tel.: +33 1 44 08 16 39. E-mail address: delphine.bouveresse@agroparistech.fr (D. Jouan-Rimbaud Bouveresse). 0169-7439/$ see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2010.05.005 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemolab