Model Selection and Cross Validation in Additive Main Effect and Multiplicative Interaction Models Carlos T. dos S. Dias* and Wojtek J. Krzanowski ABSTRACT environments where the genotypes are tested. Presence of GEI rules out simple interpretative models that have The additive main effects and multiplicative interaction (AMMI) only additive main effects of genotypes and environ- model has been proposed for the analysis of genotype–environmental ments (Mandel 1971; Crossa, 1990; Kang and Magari, data. For plant breeding, the recovery of pattern might be considered 1996). On the other hand, specific adaptations of geno- to be the principal objective of analysis. However, some problems still remain with the analysis, notably in selecting the number of types to subsets of environments is a fundamental issue multiplicative components in the model. Methods based on distribu- to be studied in plant breeding because one genotype tional assumptions do not have a sound methodological basis, while may perform well under specific environmental condi- existing data-based approaches do not optimize the cross-validation tions and may give a poor performance under other con- process. This paper first summarizes the AMMI model and outlines ditions. the available methodology for selecting the number of multiplicative Crossa et al. (2002) give a comprehensive review of components to include in it. Then two new methods are described the early approaches for analyzing GEI that include the that are based on a full “leave-one-out” procedure optimizing the conventional fixed two-way analysis of variance model, cross-validation process. Both methods are illustrated and compared the linear regression approach, and the multiplicative on some unstructured multivariate data. Finally, their applications to models. The empirical mean response, y ij , of the ith analysis of genotype environment interaction (GEI) are demon- strated on experimental grain yield data. Conclusions of the study genotype in the jth environment with n replicates in are that the “leave-one-out” procedure is preferable in practice to each of the i j cells is expressed as y ij =+ g i + either distributional F-test or cross-validation randomization meth- e j + (ge) ij + ε ij where is the grand mean across all ods, and of the two “leave-one-out” procedures the Eastment-Krza- genotypes and environments, g i is the additive effect nowski method exhibits the greater parsimony and stability. of the ith genotype, e j is the additive effect of the jth environment, (ge) ij is the GEI component for the ith genotype in the jth environment, and ε ij is the error M ost of the data collected in agricultural experi- assumed to be NID (0, 2 /n) (where 2 is the within- ments are multivariate in nature because several environment error variance, assumed to be constant). attributes are measured on each of the individuals in- This model is not parsimonious, because each GEI cell cluded in the experiments, i.e., genotypes, agronomic has its own interaction parameter, and uninformative, treatments, etc. Such data can be arranged in a matrix because the independent interaction parameters are X, where the (i,j )th element represents the value ob- complicated and difficult to interpret. served for the jth attribute measured on the ith indi- Yates and Cochran (1938) suggested treating the GEI vidual (case) in the sample. Common multivariate term as being linearly related to the environmental ef- techniques used to analyze such data include principal fect, that is setting (ge) ij = i e j + d ij , where i is the component analysis (PCA) if there is no a priori group- linear regression coefficient of the ith genotype on the ing of either individuals or variables; canonical variate environmental mean and d ij is a deviation. This approach or discriminant analysis if the individuals in the sample was later used by Finlay and Wilkinson (1963) and form a priori groups; canonical correlation analysis if slightly modified by Eberhart and Russell (1966). Tukey the variables form a priori groups; and cluster analysis (1949) proposed a test for the GEI using (ge) ij = Kg i e j if some partitioning of the sample is sought. (where K is a constant). Mandel (1961) generalized Tu- In plant breeding, multienvironment trials (MET) are key’s model by letting (ge) ij =  i e j for genotypes or important for testing general and specific cultivar adap- (ge) ij =g i j for environments and thus obtaining a tation. A cultivar grown in different environments will “bundle of straight lines” that may be tested for concur- frequently show significant fluctuation in yield perfor- rence (i.e., whether the i or the j are all the same) or mance relative to other cultivars. These changes are nonconcurrence. influenced by the different environmental conditions Gollob (1968) and Mandel (1969, 1971) proposed a and are referred to as GEI. A typical example of a bilinear GEI term (ge) ij = s k=1 k ik jk in which 1 matrix X arises in the analysis of MET, in which the rows of X are the genotypes and the columns are the Abbreviations: AMMI, additive main effects and multiplicative inter- action model; COMM, completely multiplicative model; DF, degrees of freedom; GEI, genotype environment interaction; GREG; geno- type regression model; IPCA, interaction principal component analy- C.T. dos S. Dias, Dep. of Cie ˆ ncias Exatas, Univ. of Sa ˜ o Paulo/ESALQ, sis; MET, multi-environment trials; NID, normally and independently Av. Padua Dias 11, Cx.P.09, 13418-900, Piracicaba-SP, Brazil; W.J. distributed; PCA, principal components analysis; PRESS, predictive Krzanowski, School of Mathematical Sciences, Laver Building, North sum of squares; PRECORR, predictive correlation; RMSPD, root Park Road, Exeter, EX4 4QE, UK. Received 8 Apr. 2002. *Corre- mean square predictive difference; SHMM, shifted multiplicative sponding author (ctsdias@carpa.ciagri.usp.br). model; SREG, sites regression model; SS, sum of squares; SVD, singu- lar value decomposition. Published in Crop Sci. 43:865–873 (2003). 865