Recent Origin of Plasmodium falciparum from a Single Progenitor Sarah K. Volkman, 1,2 * Alyssa E. Barry, 1,3 * Emily J. Lyons, 1,3 Kaare M. Nielsen, 1,4,5 Susan M. Thomas, 1,2 Mehee Choi, 1,4 Seema S. Thakore, 1,2 Karen P. Day, 1,3 Dyann F. Wirth, 1 Daniel L. Hartl 1,4 Genetic variability of Plasmodium falciparum underlies its transmission success and thwarts efforts to control disease caused by this parasite. Genetic variation in antigenic, drug resistance, and pathogenesis determinants is abundant, con- sistent with an ancient origin of P. falciparum, whereas DNA variation at silent (synonymous) sites in coding sequences appears virtually absent, consistent with a recent origin of the parasite. To resolve this paradox, we analyzed introns and demonstrated that these are deficient in single-nucleotide polymorphisms, as are synonymous sites in coding regions. These data establish the recent origin of P. falciparum and further provide an explanation for the abundant diversity observed in antigen and other selected genes. Plasmodium falciparum causes the most vir- ulent form of human malaria, resulting in 200 million to 300 million infections and 1 mil- lion to 3 million deaths annually (1). Genetic variation within this human pathogen facili- tates its transmission and pathogenesis and limits efforts to combat the disease. In the case of P. falciparum, the issue is caught up in controversy (2, 3). Genetic variation in proteins for antigenic determinants (4 ), drug resistance (5–8), and pathogenesis is abun- dant (9 –13), whereas DNA variation at silent (synonymous) sites in coding sequences ap- pears virtually absent (14 ). Nevertheless, mi- crosatellite variation within and among sub- populations is widespread (15, 16 ). These discrepancies could be reconciled if all extant P. falciparum derived from a single progen- itor that spread through the human population within the past few thousand years (14 ). Al- ternatively, codon usage may be so con- strained that synonymous mutations are elim- inated by selection. To resolve these possibil- ities, we analyzed 25 introns from eight in- dependent isolates and found only eight single-nucleotide polymorphisms (SNPs), five of which occur within microsatellite re- peats. In contrast, microsatellite polymor- phisms are common within introns. Our re- sults support the recent progenitor hypothesis and imply a high mutation rate for the cre- ation of microsatellite repeats. We chose introns for our analysis because introns are subject to selective constraints that differ from those for codon usage. Apart from pseudogenes, introns are among the most rapidly evolving sequences in eu- karyotes (17 ), and they are of general utility in studies of population structure (18). We chose to analyze introns from general meta- bolic or housekeeping genes on chromosome 2(19) and chromosome 3 (20), the first chro- mosomes completely sequenced, and exam- ined these regions in each of eight indepen- dent isolates from diverse geographic regions including Africa, Honduras, Southeast Asia, and Papua New Guinea (21). For each intron, the target sequence was amplified by the polymerase chain reaction and the products were cloned and sequenced (21). To guard against polymerase incorporation error, we sequenced each intron in both directions from each of three clones derived from each of three independent amplifications (21). The results demonstrate that microsatellite variation in P. falciparum is widespread within introns (Table 1), which is consistent with previous results (16, 22). Across all introns there are 71 microsatellite repeats, which we define as a region of eight or more tandem repeats of a sequence 1 to 8 base pairs (bp) in length. Among the microsatellite re- peats, 36 (51%) are monomorphic and 35 (49%) are polymorphic with two or more alleles in the sample [Web fig. 1 (21)]. The tremendous amount of genetic diversity gen- erated by the alteration of these repetitive sequences is illustrated by the microsatellite genotype of each isolate with respect to these polymorphisms [Web fig. 2 (21)]. The geno- type of each isolate is unique, even among contemporary isolates from the same geo- graphic region. The potential for microsatel- lite diversity in P. falciparum is also evident by the number of distinct alleles for each polymorphic microsatellite repeat (21). These data support a high rate of microsatellite mu- tation within introns, presumably as a conse- quence of replication slippage (23, 24 ). In contrast to the microsatellite variation, SNPs within the introns are rare (Table 1). Altogether, we observed eight SNPs in the introns. Among these, five were located with- in microsatellite polymorphisms. Across all of the 4217 bp of intron sequence (counting with respect to the 3D7 reference sequence), only 800 bp (19%) are located in polymor- phic microsatellites. The excess of SNPs in the microsatellite repeats is, therefore, highly significant (P = 0.008, Fisher’s exact test). This finding strongly suggests that the pro- cess of replication slippage that generates microsatellite variation (24 ) also increases the rate of single-nucleotide substitutions. Therefore, we have ignored the five SNPs associated with microsatellite polymorphisms in estimating the time since the most recent common ancestor of all extant P. falciparum. The excess SNPs within polymorphic micro- satellite repeats may also explain the relative- ly high frequency of synonymous polymor- phisms within amino acid repeat sequences in certain proteins, such as the circumsporozoite protein (24 ). Introduction of SNPs within repetitive sequence occurs in both introns and exons, but selective constraints on coding sequences results in fewer polymorphisms within coding regions of the genome. Our findings suggest that antigenic variation as- sociated with these repeated amino acid se- quences has occurred within P. falciparum, rather than by lateral transfer or some other mechanism. Discounting the intron sequence located in microsatellite polymorphisms, we se- quenced 3417 bp in each of eight isolates (total of 27,336 bp) and identified only one certain SNP. The remaining two SNPs are found in one small intron predicted by Glim- merM (25) and only in the D6 isolate, where there is evidence of alternative splicing [Web fig. 3 (21)] within the aspartyl protease gene. Therefore, we present the statistical analysis both with and without the D6 SNPs. Com- bining our data with those from Rich et al. (14 ), and assuming that the rate of nucleotide substitution in unique intron sequence and monomorphic microsatellites is equal to that for fourfold-degenerate sites in coding re- gions, we estimate the age of the most recent common ancestor (MRCA) of all extant P. falciparum to be in the range of 3200 to 7700 years [the estimate was obtained with equa- 1 The Harvard-Oxford Malaria Genome Diversity Project. 2 Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115, USA. 3 Department of Zoology, University of Oxford, South Parks Road, OX1 3PS, Oxford, UK. 4 Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA. 5 De- partment of Botany, Norwegian University of Science and Technology, N-7491, Trondheim, Norway. *These authors contributed equally to this work. To whom correspondence should be addressed. E- mail: dfwirth@hsph.harvard.edu R EPORTS 20 JULY 2001 VOL 293 SCIENCE www.sciencemag.org 482