Genetic Epidemiology 31: 189–194 (2007) Tag SNPs Chosen From HapMap Perform Well in Several Population Isolates Susan Service, 1 The International Collaborative Group on Isolated Populations, 2 Chiara Sabatti, 3,4 and Nelson Freimer 1,5,6Ã 1 Center for Neurobehavioral Genetics, University of California, Los Angeles, California 2 The International Collaborative Group on Isolated Populations members that are not listed separately as authors of this manuscript are: Maria Karayiorgou, J. Louw Roos, Herman Pretorious, Gabriel Bedoya, Jorge Ospina, Andres Ruiz-Linares, Anto ´nio Macedo, Joana Almeida Palha, Peter Heutink, Yurii Aulchenko, Ben Oostra, Cornelia van Duijn, Marjo-Riitta Jarvelin, Teppo Varilo, Lynette Peddle, Proton Rahman, Giovanna Piras, Maria Monne and Leena Peltonen 3 Department of Human Genetics, University of California, Los Angeles, California 4 Department of Statistics, University of California, Los Angeles, California 5 The Jane and Terry Semel Institute for Neuroscience and Human Behavior, University of California, Los Angeles, California 6 Department of Psychiatry, University of California, Los Angeles, California Population isolates may be particularly useful for association studies of complex traits. This utility, however, largely depends on the transferability of tag SNPs chosen from reference samples, such as HapMap, to samples from such populations. Factors that characterize population isolates, such as widespread genetic drift, could impede such transferability. In this report, we show that tag SNPs chosen from HapMap perform well in several population isolates; this is true even for populations that differ substantially from the HapMap sample either in levels of linkage disequilibrium or in SNP allele frequency distributions. Genet. Epidemiol. 31:189–194, 2007. r 2007 Wiley-Liss, Inc. Key words: tag SNP transferability; linkage disequilibrium; genome-wide association Contract grant sponsor: NIH; Contract grant numbers: MH001375, MH049499, NS037484 and NS040024. Ã Correspondence to: Nelson B Freimer, UCLA Center for Neurobehavioral Genetics, Gonda Center, Room 3506, 695 Charles E Young Drive South, Box 951761, Los Angeles CA 90095-1761. E-mail: nfreimer@mednet.ucla.edu Received 21 July 2006; Accepted 8 November 2006 Published online 23 February 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/gepi.20201 INTRODUCTION How to choose markers is a primary question in the design of genetic association studies. This is becoming less of a practical issue for whole- genome association studies where the cost advan- tages of off-the-shelf arrays typically over-ride other considerations. Most investigations of well- delineated regions of interest, however, such as fine mapping studies or evaluations of specific candidate genes, use custom designed sets of SNPs. Due to the efforts of large-scale SNP discovery programs such as the International HapMap Project [The International HapMap Consortium, 2005], for most such projects there is an overabundance of SNPs available. Therefore, in choosing SNPs for such studies investigators typically take advantage of the fact that, because of linkage disequilibrium (LD) among the SNPs, many SNPs present redundant information. Tag SNPs are those subsets of SNPs that together capture the majority of the information contained in the entire set of genotyped SNPs. Most algorithms use r 2 as a metric by which to choose tag SNPs [Carlson et al., 2004]. An r 2 5 0.8 is a common threshold for selecting tags, so that all variation is either typed directly as a tag, or is in LD with a tag SNP at a level of r 2 5 0.8 or greater. As the values for r 2 for large numbers of pairs of SNPs are currently known only for reference samples such as those of HapMap, these datasets provide the basis for choosing tag SNPs for genotyping in association studies. Implicit in such use of HapMap data is the assumption that the tags chosen from HapMap populations — Eur- opean Americans, West Africans, Japanese, and Han Chinese — will perform similarly in other samples from populations with similar ancestry. r 2007 Wiley-Liss, Inc.