Mapping Disease-Susceptibility Genes in Admixed Populations Using Interval Principal Component Tests Wen-Chung Lee 1,2 and Yu-Hsiang Shu 1 Received 22 Dec. 2003—Final 19 May 2004 Family-based association approach for mapping disease-susceptibility genes of complex human diseases is a topical issue in genetic epidemiology. It is well known that admixture between genetically differentiated populations can result in high levels of linkage disequilib- rium at loci separated far apart. This property has been capitalized upon to reduce the bur- den of genotyping in a genomewide association scan. The authors describe a new approach for admixture mapping—the ‘‘interval principal component test’’ (IPCT). The genome is divided into a multitude of non-overlapping ‘‘intervals’’ (with interval length of 10–20 cM) and the information of the markers in the same interval is integrated using the principal component analysis. Monte-Carlo simulation shows that an interval-by-interval scan using IPCT has much better performances than a conventional marker-by-marker scan using the transmission/disequilibrium test (TDT). KEY WORDS: Complex disease; epidemiologic methods; genetic epidemiology; disease-susceptibility gene; principal component analysis. INTRODUCTION Family-based association approach for mapping disease-susceptibility genes of complex human dis- eases is a topical issue in genetic epidemiology (Khoury and Yang, 1998; Risch and Merikangas, 1996; Schaid, 1998). In particular, the application of the transmission/disequilibrium test (TDT) in a case–parents study has received much attention (Ewens and Spielman, 1995; Schaid, 1998; Spielman and Ewens, 1996; Spielman et al., 1993). To scan the genome using TDT, a total of up to 10 5 markers has to be typed (Risch and Merikangas, 1996). Such a high density of markers ensures that at least one of the typed markers will have enough level of linkage disequilibrium with a disease-susceptibility gene and thus will, hopefully, give a positive TDT signal. It is well known that admixture between geneti- cally differentiated populations can result in high levels of linkage disequilibrium even for loci sepa- rated very far apart (10–20 cM) (Chakraborty and Weiss, 1988; McKeigue, 1997). (Examples of admixed populations are African American, Mexi- can American, Hispanic American, and Anglo- Indian, etc.) This forms the basis for ‘‘mapping by admixture linkage disequilibrium’’ (MALD) (Stephens et al., 1994). For MALD, a loose array of much fewer markers, say, a total of 500 mark- ers across the genome, may be all that is needed for enough power (McKeigue, 1997; Kaplan et al., 1998). Of course, such convenience comes at a price—MALD has poor resolution in localizing a disease-susceptibility gene. An optimal design may thus be composed of two stages: an initial genome- wide scan in an admixed population followed by a fine-mapping study in a homogenous population for those regions of the genome that show evidence of linkage disequilibrium (candidate regions) in the ini- tial scan (see, for example, Morris et al., 2002). The above genomewide scan (whether in a homo- geneous or in an admixed population) is performed 1 Graduate Institute of Epidemiology, College of Public Health, National Taiwan University. 2 To whom correspondence should be addressed at Graduate Institute of Epidemiology, National Taiwan University, No. 1, Jen Ai Rd., Sec. 1, Taipei, Taiwan. Fax: +886 2 23511955. e-mail: wenchung@ha.mc.ntu.edu.tw 525 0001-8244/04/0900-0525/0 Ó 2004 Springer Science+Business Media, Inc. Behavior Genetics, Vol. 34, No. 5, September 2004 (Ó 2004)