Tree Genetics & Genomes (2006) 2: 152–164 DOI 10.1007/s11295-006-0038-0 ORIGINAL PAPER Cuauhtemoc Cervantes-Martinez . J. Steven Brown . Raymond Schnell . Juan C. Motamayor . Alan W. Meerow . Dapeng Zhang A computer simulation study on the number of loci and trees required to estimate genetic variability in cacao (Theobroma cacao L.) Received: 9 September 2005 / Revised: 21 November 2005 / Accepted: 30 January 2006 / Published online: 23 March 2006 # Springer-Verlag 2006 Abstract Current methods for measures of genetic diversity of populations and germplasm collections are often based on statistics calculated from molecular markers. The objective of this study was to investigate the precision and accuracy of the most common estimators of genetic variability and population structure, as calcu- lated from simple sequence repeat (SSR) marker data from cacao (Theobroma cacao L.). Computer simulated ge- nomes of replicate populations were generated from initial allele frequencies estimated using SSR data from cacao accessions in a collection. The simulated genomes consisted of ten linkage groups of 100 cM in length each. Heterozygosity, gene diversity and the F statistics were studied as a function of number of loci and trees sampled. The results showed that relatively small random samples of trees were needed to achieve consistency in the observed estimations. In contrast, very large random samples of loci per linkage group were required to enable reliable inferences on the whole genome. Precision of estimates was increased by more than 50% with an increase in sample size from one to five loci per linkage group or 50 per genome, and up to 70% with ten loci per linkage group, or equivalently, 100 loci per genome. The use of fewer, highly polymorphic loci to analyze genetic variability led to estimates with substantially smaller variance but with an upward bias. Nevertheless, the relative differences of estimates among populations were generally consistent for the different levels of polymorphism considered. Introduction Genetic diversity of plant species is a major concern among geneticists and plant breeders involved in preserving genetic variation. Current methods for measuring genetic diversity of populations and germplasm collections are often statistics calculated from molecular marker data (Mohammadi and Prasanna 2003). Economic and practical restrictions on the study of large portions of the genome have commonly led researchers to characterize the genetic variation of populations with relatively few molecular markers. Numerous studies have been reported in plant species using different-sized samples of individuals and marker density (Ni et al. 2002; Liu et al. 2003; Reif et al. 2004; Ahmad et al. 2003; Gao et al. 2004). Often, marker loci are selected on the number and frequency of alleles observed in preliminary screenings. The specific effect of the number of individuals and markers on the sample size on several measures of genetic distance has been recently studied by Kalinowski (2002a,b, 2005), using computer simulation methods. These studies yielded valuable information on the optimal sample size of individuals and loci in linkage equilibrium at differing levels of polymor- phism, with several different divergence times, mutation rates, and models. However, his studies did not cover optimal marker density in situations in which loci are in linkage disequilibrium caused by physical linkage or reproduction of few individuals in a population for several generations. Also, the simulated population sizes and num- ber of generations of random mating were considerably large to be representative for perennial plant species such as fruit trees; and finally, the numbers of markers studied were substantially small to consider the patterns of genetic distance estimates as representative of genomic values. C. Cervantes-Martinez (*) . J. S. Brown . R. Schnell . A. W. Meerow United States Department of Agriculture-Agriculture Research Service (USDA-ARS), The Subtropical Horticulture Research Station (SHRS), Miami, FL 33158, USA e-mail: ccervantes@saa.ars.usda.gov J. C. Motamayor Mars Inc., c/o United States Department of Agriculture- Agriculture Research Service (USDA-ARS), The Subtropical Horticulture Research Station (SHRS), Miami, FL 33158, USA D. Zhang United States Department of Agriculture-Agriculture Research Service (USDA-ARS), Beltsville, MD 20705, USA