Mutation rate variation in the mammalian genome Hans Ellegren , Nick GC Smith and Matthew T Webster Recent advances in the large-scale sequencing of mammalian genomes have provided a means to study divergence in not only genic sequences but also in the non-coding bulk of DNA. There is evidence of significant variation in the levels of divergence between presumably neutral regions, pointing at an underlying variation in the rate of mutation across the genome. Apparently, such variation occurs on different scales, including sequence context effects (the influence of neighboring nucleotides on the rate of mutation at individual sites), variation within chromosomes (on the scales of kilobases as well as megabases), and between chromosomes (among autosomes as well as between autosomes and sex chromosomes). An important aspect for further research in this area is to study whether there is an ultimate evolutionary explanation for mutation rate variation within mammalian genomes. Addresses Department of Evolutionary Biology, Uppsala University, Norbyva ¨ gen 18D, SE-752 36 Uppsala, Sweden e-mail: Hans.Ellegren@ebc.uu.se Current Opinion in Genetics & Development 2003, 13:562–568 This review comes from a themed issue on Genomes and evolution Edited by Evan Eichler and Nipam Patel 0959-437X/$ – see front matter ß 2003 Elsevier Ltd. All rights reserved. DOI 10.1016/j.gde.2003.10.008 Abbreviation UTR untranslated region Introduction Mutation is a fundamental process without which evolu- tion would not occur. Knowledge about mutation rates is therefore key to evolutionary and population genetics, but also to several other areas. For instance, proper evolutionary dating founded on molecular clocks requires knowledge of the mutation rate. Moreover, as the double- edged sword effect of mutation is to cause genetic dis- ease, understanding the rate of mutation is important in medical genetics. Furthermore, if we are to infer selection from patterns of divergence, an important aspect of comparative and functional genomics, then we need realistic null models of neutral variation (i.e. knowledge of mutation patterns). Finally, knowledge about mutation rates can shed light on issues relating to the mechanistic basis of germline mutation – there is, for instance, an ongoing debate concerning the relative importance of replication errors as a source of mutation. There is an increasing body of evidence pointing at within-genome variation in the substitution rate at pre- sumably neutral sites, a variation most easily explained by an underlying variation in the rate of mutation. The first such hints were offered by the observation that synon- ymous (silent) substitution rates vary between mamma- lian genes [1]. However, inferring patterns of mutation from patterns of substitution in silent sites can be proble- matic (Table 1). Fortunately, the recent burst of large- scale genomic sequence data has permitted the study of mutation rate variation at a new and much larger scale. Importantly, we are now starting to learn about mutation processes in the non-coding bulk of DNA, both in repe- titive and unique sequences (Table 1). Here we review mammalian mutation rate variation from a genomics perspective, paying particular attention to recent data obtained in large-scale sequence comparisons within pri- mates. We shall focus on the process of point mutation as mutations involving insertions and deletions — including short indels, transpositions and length mutation in tan- dem repetitive DNA — are generally thought of as having a mechanistic basis different from that of point mutation. Methodological aspects Given that spontaneous germline mutation rates for point substitutions in mammals are only 10 8 per bp per generation [2], we cannot hope to observe enough mutations directly to reliably infer mutation rates. As a consequence, mutation patterns are usually studied indir- ectly by comparing orthologous sequences from different species. Polymorphism data based on sequence variation within species also represent a useful and important source of information concerning mutation, although here we focus on inter-species sequence comparisons. To infer mutation reliably, it is required that the sequences are neutrally evolving, so that sequence divergence is pro- portional to mutation [3], and that substitution events can be properly inferred from sequence data. We can then infer variation in mutation from variation in substitution (but note problems with ancestral polymorphism [4]). Unfortunately, the requirements of selective neutrality and unbiased methods of sequence analysis are rather difficult to verify. Comparative sequence analyses are starting to reveal an abundance of non-coding DNA regions that are conserved across long-range mammalian compar- isons [5 ,6]. The inference that all such conserved regions are subject to negative selection is unwarranted, however, because variation in mutation rates is also expected to generate conserved regions. For example, for a substitu- tion rate of 10%, the chance of a perfectly conserved block of 20 bp is 0.9 20 ¼ 0.12. However if half the regions have a 562 Current Opinion in Genetics & Development 2003, 13:562–568 www.current-opinion.com