J Mol Evo! (1995) 41:1038-1047 JO.NAL Or MOLECULAR IEVOLUTION © Springer-Verlag New York Inc. 1995 The Contribution of Slippage-Like Processes to Genome Evolution John M. Hancock Gene & GenomeEvolution Group, MRC Clinical SciencesCentre, Royal PostgraduateMedical School,HammersmithHospital, LondonW12 ONN, UK Received: 4 November 1994 / Accepted: 13 April 1995 Abstract. Simple sequences present in long (>30 kb) sequences representative of the single-copy genome of five species (Homo sapiens, Caenorhabditis elegans, Saccharomyces cerevisiae, E. coIi, and Mycobacterium leprae) have been analyzed. A close relationship was observed between genome size and the overall level of sequence repetition. This suggested that the incorpora- tion of simple sequences had accompanied increases of genome size during evolution. Densities of simple se- quence motifs were higher in noncoding regions than in coding regions in eukaryotes but not in eubacteria. All five genomes showed very biased frequency distribu- tions of simple sequence motifs in all species, particu- larly in eukaryotes where AAA and TTT predominated. Interspecific comparisons showed that noncoding se- quences in eukaryotes showed highly significantly sim- ilar frequency distributions of simple sequence motifs but this was not true of coding sequences. ANOVA of the frequency distributions of simple sequence motifs indicated strong contributions from motif base composi- tion and repeat unit length, but much of the variation remained unexplained by these parameters. The se- quence composition of simple sequences therefore appears to reflect both underlying sequence biases in slippage-like processes and the action of selection. Fre- quency distributions of simple sequence motifs in coding sequences correlated weakly or not at all with those in noncoding sequences. Selection on coding sequences to eliminate undesirable sequences may therefore have been strong, particularly in the human lineage. Key words: Genome evolution -- Replication slip- page -- C-value -- Simple sequences -- Microsatellites Introduction The haploid DNA content of organisms (their genome size or C-value) can vary dramatically even between closely related species (see Cavalier-Smith 1985). A number of molecular processes, in particular transposi- tion and the amplification of satellite sequences, have been implicated in such changes, but the contribution of slipped strand mispairing (Levinson and Gutman 1987) and other slippage-like processes (see for example Be- benek and Kunkel 1990; Lichtenauer-Kaligis et al. 1993; Jeffreys et al. 1994; Richards and Sutherland 1994) that can give rise to interspersed repetitive sequences based on short motifs (simple sequences) has been largely ig- nored. Slippage-like processes may be a very ancient feature of replicative systems (see Li and Nicolaou 1994; Sievers and von Kiedrowski 1994). Early experimental and computer analyses showed that simple sequences are very common in genomes and in sequence databases and that simple sequences of dif- ferent composition are found at different frequencies (Tautz and Renz 1984; Tautz et al. 1986). Bias in the frequency of different simple sequences has been con- firmed more recently in an analysis of the yeast chromo- some III sequence (Valle 1993). Slippage-like processes have been implicated in the evolution of a number of biological molecules including the large- and small- subunit ribosomal RNAs (rRNAs) (Hancock and Dover 1988, 1990; Hancock 1995), the Drosophila develop- mental gene hunchback (Treier et al. 1989), the mito- chondrial control region (D-loop) in mammals (Hoelzel et al. 1991, 1993), and the eukaryotic RNA polymerase II transcription factor TBP (TATA-binding protein) (Han- cock 1993). However, although it is clear that simple