A CELLULAR AUTOMATA APPROACH TO DETECTING INTERACTIONS AMONG SINGLE-NUCLEOTIDE POLYMORPHISMS IN COMPLEX MULTIFACTORIAL DISEASES JASON H. MOORE, Ph.D., LANCE W. HAHN, Ph.D. Program in Human Genetics, Department of Molecular Physiology and Biophysics, 519 Light Hall, Vanderbilt University Medical School, Nashville, TN 37232-0700, USA Moore@phg.mc.Vanderbilt.edu The identification and characterization of susceptibility genes for common complex multifactorial human diseases remains a statistical and computational challenge. Parametric statistical methods such as logistic regression are limited in their ability to identify genes whose effects are dependent solely or partially on interactions with other genes and environmental exposures. We introduce cellular automata (CA) as a novel computational approach for identifying combinations of single-nucleotide polymorphisms (SNPs) associated with clinical endpoints. This alternative approach is nonparametric (i.e. no hypothesis about the value of a statistical parameter is made), is model-free (i.e. assumes no particular inheritance model), and is directly applicable to case-control and discordant sib-pair study designs. We demonstrate using simulated data that the approach has good power for identifying high-order nonlinear interactions (i.e. epistasis) among four SNPs in the absence of independent main effects. 1 Introduction The idea that epistasis or gene-gene interaction plays an important role in human biology is not new. In fact, Wright 1 emphasized that the relationship between genes and biological endpoints is dependent on dynamic interactive networks of genes and environmental factors. This idea holds true today. Gibson 2 stresses that gene-gene and gene-environment interactions must be ubiquitous given the complexities of intermolecular interactions that are necessary to regulate gene expression and the hierarchical complexity of metabolic networks. Indeed, there is increasing statistical and epidemiological evidence that epistasis is very common 3 . For example, in a study of 200 sporadic breast cancer subjects, Ritchie et al. 4 demonstrated a statistically significant interaction among four polymorphisms in three estrogen metabolism genes in the absence of any independent main effects. Further, Nelson et al. 5 found that epistatic effects of lipid genes on lipid traits was very common. Despite the importance of epistasis in human biology there are few statistical methods that are capable of identifying interactions among more than two polymorphisms in relatively small sample sizes. For example, logistic regression is a commonly used method for modeling the relationship between discrete predictors such as genotypes and discrete clinical outcomes 6 . However, logistic regression, like most parametric statistical methods, is limited in its ability to deal Pacific Symposium on Biocomputing 7:53-64 (2002)