Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples Charleston W. K. Chiang 1,2,3 , Zofia K. Z. Gajdos 1,2,3 , Joshua M. Korn 2,4,5 , Finny G. Kuruvilla 2,4,5 , Johannah L. Butler 3 , Rachel Hackett 2 , Candace Guiducci 2 , Thutrang T. Nguyen 3 , Rainford Wilks 6 , Terrence Forrester 7 , Christopher A. Haiman 8 , Katherine D. Henderson 9 , Loic Le Marchand 10 , Brian E. Henderson 8 , Mark R. Palmert 11,12 , Colin A. McKenzie 7 , Helen N. Lyon 2,3 , Richard S. Cooper 13 , Xiaofeng Zhu 14 , Joel N. Hirschhorn 1,2,3 * 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 2 Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts, United States of America, 3 Program in Genomics and Divisions of Genetics and Endocrinology, Children’s Hospital, Boston, Massachusetts, United States of America, 4 Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, United States of America, 5 Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America, 6 Epidemiology Research Unit, Tropical Medicine Research Institute, University of the West Indies, Kingston, Jamaica, 7 Tropical Metabolism Research Unit, Tropical Medicine Research Institute, University of the West Indies, Kingston, Jamaica, 8 Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, California, United States of America, 9 Department of Population Sciences, Division of Cancer Etiology, City of Hope National Medical Center, Duarte, California, United States of America, 10 Epidemiology Program, Cancer Research Center of Hawaii, University of Hawaii, Honolulu, Hawaii, United States of America, 11 Division of Endocrinology, The Hospital for Sick Children, Toronto, Ontario, Canada, 12 Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada, 13 Department of Preventive Medicine and Epidemiology, Stritch School of Medicine, Loyola University Chicago, Maywood, Illinois, United States of America, 14 Department of Biostatistics and Epidemiology, Case Western Reserve University, Cleveland, Ohio, United States of America Abstract As we move forward from the current generation of genome-wide association (GWA) studies, additional cohorts of different ancestries will be studied to increase power, fine map association signals, and generalize association results to additional populations. Knowledge of genetic ancestry as well as population substructure will become increasingly important for GWA studies in populations of unknown ancestry. Here we propose genotyping pooled DNA samples using genome-wide SNP arrays as a viable option to efficiently and inexpensively estimate admixture proportion and identify ancestry informative markers (AIMs) in populations of unknown origin. We constructed DNA pools from African American, Native Hawaiian, Latina, and Jamaican samples and genotyped them using the Affymetrix 6.0 array. Aided by individual genotype data from the African American cohort, we established quality control filters to remove poorly performing SNPs and estimated allele frequencies for the remaining SNPs in each panel. We then applied a regression-based method to estimate the proportion of admixture in each cohort using the allele frequencies estimated from pooling and populations from the International HapMap Consortium as reference panels, and identified AIMs unique to each population. In this study, we demonstrated that genotyping pooled DNA samples yields estimates of admixture proportion that are both consistent with our knowledge of population history and similar to those obtained by genotyping known AIMs. Furthermore, through validation by individual genotyping, we demonstrated that pooling is quite effective for identifying SNPs with large allele frequency differences (i.e., AIMs) and that these AIMs are able to differentiate two closely related populations (HapMap JPT and CHB). Citation: Chiang CWK, Gajdos ZKZ, Korn JM, Kuruvilla FG, Butler JL, et al. (2010) Rapid Assessment of Genetic Ancestry in Populations of Unknown Origin by Genome-Wide Genotyping of Pooled Samples. PLoS Genet 6(3): e1000866. doi:10.1371/journal.pgen.1000866 Editor: Bruce Walsh, University of Arizona, United States of America Received October 6, 2009; Accepted January 30, 2010; Published March 5, 2010 Copyright: ß 2010 Chiang et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a graduate research fellowship from NSF to CWKC and grants from National Institutes of Health to JNH (R01DK075787), to RSC (R37HL45508 and R01HL53353), to XZ (R01HL074166), and to MRP (R01HD048960). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: joelh@broadinstitute.org Introduction Genetic ancestry, as studied through DNA sequence variation, has shed light on the history, migration patterns, and relationships among human populations [1,2]. In the context of medical population genetics, genetic ancestry forms the basis of admixture mapping [3]. Additionally, genetic ancestry is useful for proper matching of cases and controls and is also an important covariate to consider in association studies for complex human traits [4,5] as spurious associations around variants with large allele frequency differences between populations have long been recognized as potential confounders [6–9]. For admixed populations, having an estimated proportion of genetic ancestry attributable to each ancestral population (i.e., the admixture proportion) would also PLoS Genetics | www.plosgenetics.org 1 March 2010 | Volume 6 | Issue 3 | e1000866