ARTICLE Model Identification in Presence of Incomplete Information by Generalized Principal Component Analysis: Application to the Common and Differential Responses of Escherichia coli to Multiple Pulse Perturbations in Continuous, High-Biomass Density Culture Daniel V. Guebel, 1 Manuel Ca ´novas, 2 Ne ´stor V. Torres 3 1 Biotechnology Counseling Services, Buenos Aires, Argentina 2 Departamento de Bioquı´mica y Biologı´a Molecular B, Facultad de Quı´mica, Universidad de Murcia, Espan ˜ a, Spain 3 Grupo de Tecnologı´a Bioquı´mica, Departamento de Bioquı´mica y Biologı´a Molecular, Facultad de Biologı ´a, Universidad de La Laguna, 38206 La Laguna, Tenerife, Islas Canarias, Spain; telephone: þ34-922-318334; Fax: þ34-922-318354; e-mail: ntorres@ull.es Received 12 March 2009; revision received 5 May 2009; accepted 29 May 2009 Published online 8 June 2009 in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/bit.22438 ABSTRACT: In a previous report we described a multivari- ate approach to discriminate between the different response mechanisms operating in Escherichia coli when a steady, continuous culture of these bacteria was perturbed by a glycerol pulse (Guebel et al., 2009, Biotechnol Bioeng 102: 910–922). Herein, we present a procedure to extend this analysis when multiple, spaced pulse perturbations (glycerol, fumarate, acetate, crotonobetaine, hypersaline plus high- glycerol basal medium and crotonobetaine plus hypersaline basal medium) are being assessed. The proposed method allows us to identify not only the common responses among different perturbation conditions, but to recognize the specific response for a given stimulus even when the dynamics of the perturbation is unknown. Components common to all conditions are determined first by Generalized Principal Components Analysis (GPCA) upon a set of covariance matrices. A metrics is then built to quantify the similitude distance. This is based on the degree of variance extraction achieved for each variable along the GPCA deflation pro- cesses by the common factors. This permits a cluster ana- lysis, which recognizes several compact sub-sets containing only the most closely related responsive groups. The GPCA is then run again but is restricted to the groups in each sub- set. Finally, after the data have been exhaustively deflated by the common sub-set factors, the resulting residual matrices are used to determine the specific response factors by classical principal component analysis (PCA). The proposed method was validated by comparing its predictions with those obtained when the dynamics of the perturbation was determined. In addition, it showed to have a better perfor- mance than the obtained with other multivariate alternatives (e.g., orthogonal contrasts based on direct GPCA, Tucker-3 model, PARAFAC, etc.). Biotechnol. Bioeng. 2009;104: 785–795. ß 2009 Wiley Periodicals, Inc. KEYWORDS: Escherichia coli; glycerol pulse; fumarate pulse; acetate pulse; crotonobetaine pulse; carnitine pulse; hypersaline pulse; multivariate analysis; Tucker-3 model; generalized principal component analysis; regulatory network; stress–response; microbial physiology Introduction Multivariate methods (‘‘chemometry’’) are increasingly used in biological research (see for review Eriksson et al., 2004; Robertson et al., 2007; van der Werf et al., 2005). Their broad spreading is due to the capacity of these methods to deal with highly complex series of data, revealing their ‘‘latent structures’’ (Johnson and Wichner, 1998a; Wall et al., 2003) while multivariate observations can themselves be grouped (D’haeselee, 2005; Johnson and Wichner, 1998b). Hence, both emergent properties can contribute to making the underlying relationships present in many biological phenomena more interpretable. In a previous work, we reported a chemometric approach for analyzing the response of bacterial cells (Escherichia coli) Correspondence to: N.V. Torres Contract grant sponsor: MICINN (Spain) Contract grant number: BIO2008-04500-C02-02; BIO2008-04500-C02-01 Additional Supporting Information may be found in the online version of this article. ß 2009 Wiley Periodicals, Inc. Biotechnology and Bioengineering, Vol. 104, No. 4, November 1, 2009 785