ORIGINAL INVESTIGATION Jong-Jin Kim Paul Verdu Andrew J. Pakstis William C. Speed Judith R. Kidd Kenneth K. Kidd Use of autosomal loci for clustering individuals and populations of East Asian origin Received: 13 September 2004 / Accepted: 14 April 2005 / Published online: 19 July 2005 Ó Springer-Verlag 2005 Abstract We studied the genetic relationships among East Asian populations based on allele frequency dif- ferences to clarify the relative similarities of East Asian populations with a specific focus on the relationships among the Koreans, the Japanese, and the Chinese populations known to be genetically similar. The goal is to find markers appropriate for differentiating among the specific populations. In this study, no prior data existed for Koreans and the markers were selected to differentiate Chinese and Japanese. We typed, using AB TaqMan assays, single-nucleotide polymorphisms (SNPs) at 43 highly selected mostly independent diallelic sites, on 386 individuals from eight East Asian popula- tions (Han Chinese from San Francisco, Han Chinese from Taiwan, Hakka, Koreans, Japanese, Ami, Atayal, and Cambodians) and one Siberian population (Yakut). We inferred group membership of individuals using a model-based clustering method implemented by the STRUCTURE program and population clustering by using computer programs DISTANCE, NEIGHBOR, LSSEARCH, and DRAWTREE, respectively, calculat- ing genetic distances among populations, calculating neighbor-joining and least-squares trees, and drawing the calculated trees. On average 52% of individuals in the three Chinese groups were assigned into one cluster, and, respectively, 78 and 69% of Koreans and Japanese into a different cluster. Koreans differentiated from the Chinese groups and clustered with the Japanese in the principal component analysis (PCA) and in the best least-squares tree. The majority of Koreans were difficult to distinguish from the Japanese. This study shows that a relatively few highly selected markers can, within limits, differentiate between closely related populations. Introduction The number of confirmed DNA polymorphisms detect- able directly in the DNA and defined in various data- bases has increased from fewer than 200 in (HGM6 1981) to more than nine million (of which more than four million have been validated) in 2004 (dbSNP and build 121). The majority of these are single-nucleotide polymorphisms (SNPs). The SNPs are clearly the most plentiful genetic variants in the human genome, and a large number of them have high heterozygosities making them very useful DNA markers in researching genetic structure of populations and ethnic origins of individu- als in a population (Frudakis et al. 2003; Rosenberg et al. 2003). Determination of genetic relationships and genetic similarities among populations can be based on allele frequency similarities and differences of SNPs (Osier et al. 2002; Collins-Schramm et al. 2004; Fullerton et al. 2004; Kidd et al. 2004). Many different methods exist to analyze allele frequency data on populations and to represent the resulting relationships. Here we use many different analytic approaches on a highly selected dataset designed to quantify the relative similarities of East Asian populations with a specific focus on the relative similarities of Koreans, Japanese, and Chinese. Korea represents an important region for understand- ing population structure and origin of East Asians be- cause of its location in Northeast Asia between China and Japan. There are many arguments for the origin of East Asian populations (Yao et al. 2002). Major issues in- Electronic Supplementary Material Supplementary material is available for this article at http://dx.doi.org/10.1007/s00439-005- 1334-8 J.-J. Kim National Institute of Scientific Investigation, DNA Analysis Division, Seoul, Korea P. Verdu A. J. Pakstis W. C. Speed J. R. Kidd K. K. Kidd (&) Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven CT 06520, USA E-mail: Kenneth.Kidd@yale.edu Tel.: +1-203-7852654 Fax: +1-203-7856568 Hum Genet (2005) 117: 511–519 DOI 10.1007/s00439-005-1334-8