Regions of Low Single-Nucleotide Polymorphism Incidence in
Human and Orangutan Xq: Deserts and Recent Coalescences
Raymond D. Miller,
1
Patricia Taillon-Miller, and Pui-Yan Kwok
Division of Dermatology, Washington University School of Medicine, St. Louis, Missouri 63110
Received August 2, 2000; accepted October 12, 2000
While scanning for single-nucleotide polymor-
phisms (SNPs) in the human Xq25– q28 region of CEPH
families, we found six long “deserts” of low SNP inci-
dence representing 28% of the investigated genome.
One was 1.66 Mb in length. To determine whether
these SNP deserts were due to reduced input of muta-
tions or to recent coalescent events such as bottle-
necks or selective sweeps, comparative sequence was
determined from a female orangutan. The mean diver-
gence was 2.9% and was not reduced in deserts com-
pared with nondesert regions. Thus, the best explana-
tion for the SNP deserts is recent coalescent events in
humans. These events are the cause of substantial
variation in human noncoding SNP incidence. In ad-
dition, the mutational spectrum in humans and oran-
gutans was estimated as 63% AG (and CT), 17% AC (and
GT), 8% CG, 4% AT, and 8% insertion/deletions. The
average lifetime of a SNP destined to become fixed for
a new allele between these species was estimated as
284,000 years. © 2001 Academic Press
INTRODUCTION
Single-nucleotide polymorphisms (SNPs), the pre-
dominant genetic variation within the human species,
are likely to be responsible for many phenotypic differ-
ences between individuals. Existing human SNPs, cre-
ated by mutation, undoubtedly represent a small sur-
vivor fraction determined and geographically
apportioned by migration, by chance including demo-
graphic events, and potentially by selection (Chakra-
varti, 1999). Within protein coding regions, measures
of SNP variation (including SNP incidence and nucle-
otide diversity, the probability that a homologous nu-
cleotide in two sequences is not the same) are reduced
at sites causing coding changes compared with silent
sites. This pattern has been interpreted as reflecting
selection against deleterious alleles (Cargill et al.,
1999; Chakravarti, 1999; Halushka et al., 1999).
In noncoding regions it is conceptually attractive as
a working hypothesis to consider that human nucleo-
tide diversity is a constant, and it has been estimated
as 0.063%, an order of magnitude less than in Drosoph-
ila melanogaster (Nachman et al., 1998). However,
these authors further observed that noncoding nucleo-
tide diversity is not a constant in their study, ranging
among locations from no differences to 0.184%, and
they found a weak positive correlation between nucle-
otide diversity and the local recombination rate (Nach-
man et al., 1998). Other studies in humans have also
detected a range of values for noncoding nucleotide
diversity (Chakravarti, 1999; Nachman and Crowell,
2000; Taillon-Miller et al., 1998). In strains of the lab-
oratory mouse, more STSs contain no SNPs and more
STSs contain multiple SNPs than would be expected
based on a Poisson distribution (Lindblad-Toh et al.,
2000). The question arises: in human noncoding re-
gions, why does diversity of diversity exist?
In consonance with ideas of evolutionary genetics,
the incidence of noncoding SNPs in a region is the
result of three factors: first, the input rate of mutations
forming SNPs; second, the removal rate of individual
SNPs; and third, the time since the region had a single
ancestral sequence. Since factor 2, the removal rate
(fixation) of individual SNPs, is not a regional phenom-
enon, the explanation of differences in regional inci-
dence of SNPs must depend upon factor 1, the input
rate of mutations, or factor 3, the time since the region
had a single ancestral sequence. One reason that factor
1, the input rate of mutations, might differ between
two regions of the genome is that the underlying local
mutation rate could be different due to base composi-
tion and/or the effect of unknown higher order struc-
tures on the access of DNA repair molecules. For ex-
ample, by the classical neutral model, n = 4Nu + 1,
where n is the effective number of alleles (a measure of
variation), N is the effective population size, and u is
the mutation rate (Kimura, 1983). If u, the mutation
Sequence data from this article have been deposited with the
EMBL/GenBank Data Libraries under Accession Nos. AF280892–
AF280938 (orangutan sequence) and G65815–G65814 (additional
STSs). Also see accession numbers for STSs and SNPs in Taillon-
Miller and Kwok (2000, Genomics 65, 195–202).
1
To whom correspondence should be addressed at Division of
Dermatology, Box 8123, Washington University School of Medicine,
660 S. Euclid Avenue, St. Louis, MO 63110. Telephone: (314) 362-
8199. Fax: (314) 362-8159. E-mail: rmiller@psts.wustl.edu.
Genomics 71, 78 – 88 (2001)
doi:10.1006/geno.2000.6417, available online at http://www.idealibrary.com on
78
0888-7543/01 $35.00
Copyright © 2001 by Academic Press
All rights of reproduction in any form reserved.