J Mol Evo! (1995) 41:1038-1047
JO.NAL Or MOLECULAR
IEVOLUTION
© Springer-Verlag New York Inc. 1995
The Contribution of Slippage-Like Processes to Genome Evolution
John M. Hancock
Gene & GenomeEvolution Group, MRC Clinical SciencesCentre, Royal PostgraduateMedical School,HammersmithHospital,
LondonW12 ONN, UK
Received: 4 November 1994 / Accepted: 13 April 1995
Abstract. Simple sequences present in long (>30 kb)
sequences representative of the single-copy genome of
five species (Homo sapiens, Caenorhabditis elegans,
Saccharomyces cerevisiae, E. coIi, and Mycobacterium
leprae) have been analyzed. A close relationship was
observed between genome size and the overall level of
sequence repetition. This suggested that the incorpora-
tion of simple sequences had accompanied increases of
genome size during evolution. Densities of simple se-
quence motifs were higher in noncoding regions than in
coding regions in eukaryotes but not in eubacteria. All
five genomes showed very biased frequency distribu-
tions of simple sequence motifs in all species, particu-
larly in eukaryotes where AAA and TTT predominated.
Interspecific comparisons showed that noncoding se-
quences in eukaryotes showed highly significantly sim-
ilar frequency distributions of simple sequence motifs
but this was not true of coding sequences. ANOVA of
the frequency distributions of simple sequence motifs
indicated strong contributions from motif base composi-
tion and repeat unit length, but much of the variation
remained unexplained by these parameters. The se-
quence composition of simple sequences therefore
appears to reflect both underlying sequence biases in
slippage-like processes and the action of selection. Fre-
quency distributions of simple sequence motifs in coding
sequences correlated weakly or not at all with those in
noncoding sequences. Selection on coding sequences to
eliminate undesirable sequences may therefore have
been strong, particularly in the human lineage.
Key words: Genome evolution -- Replication slip-
page -- C-value -- Simple sequences -- Microsatellites
Introduction
The haploid DNA content of organisms (their genome
size or C-value) can vary dramatically even between
closely related species (see Cavalier-Smith 1985). A
number of molecular processes, in particular transposi-
tion and the amplification of satellite sequences, have
been implicated in such changes, but the contribution of
slipped strand mispairing (Levinson and Gutman 1987)
and other slippage-like processes (see for example Be-
benek and Kunkel 1990; Lichtenauer-Kaligis et al. 1993;
Jeffreys et al. 1994; Richards and Sutherland 1994) that
can give rise to interspersed repetitive sequences based
on short motifs (simple sequences) has been largely ig-
nored. Slippage-like processes may be a very ancient
feature of replicative systems (see Li and Nicolaou 1994;
Sievers and von Kiedrowski 1994).
Early experimental and computer analyses showed
that simple sequences are very common in genomes and
in sequence databases and that simple sequences of dif-
ferent composition are found at different frequencies
(Tautz and Renz 1984; Tautz et al. 1986). Bias in the
frequency of different simple sequences has been con-
firmed more recently in an analysis of the yeast chromo-
some III sequence (Valle 1993). Slippage-like processes
have been implicated in the evolution of a number of
biological molecules including the large- and small-
subunit ribosomal RNAs (rRNAs) (Hancock and Dover
1988, 1990; Hancock 1995), the Drosophila develop-
mental gene hunchback (Treier et al. 1989), the mito-
chondrial control region (D-loop) in mammals (Hoelzel
et al. 1991, 1993), and the eukaryotic RNA polymerase II
transcription factor TBP (TATA-binding protein) (Han-
cock 1993). However, although it is clear that simple