ARTICLE doi:10.1038/nature10532 The evolution of gene expression levels in mammalian organs David Brawand 1,2 *, Magali Soumillon 1,2 *, Anamaria Necsulea 1,2 *, Philippe Julien 1,2 , Ga ´bor Csa ´rdi 2,3 , Patrick Harrigan 4 , Manuela Weier 1 , Ange ´lica Liechti 1 , Ayinuer Aximu-Petri 5 , Martin Kircher 5 , Frank W. Albert 5 {, Ulrich Zeller 6 , Philipp Khaitovich 7 , Frank Gru ¨tzner 8 , Sven Bergmann 2,3 , Rasmus Nielsen 4,9 ,SvantePa¨a ¨bo 5 & Henrik Kaessmann 1,2 Changes in gene expression are thought to underlie many of the phenotypic differences between species. However, large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages and tissues and which probably contributed to the specific organ biology of various mammals. Shared mammalian traits include lactation, hair and relatively large brains with unique structures 1 . In addition to these traits, individual lineages have evolved distinct anatomical, physiological and beha- vioural characteristics relating to differences in reproduction, life span, cognitive abilities and disease susceptibility. The molecular changes underlying these phenotypic shifts and the associated selective pres- sures have begun to be investigated using available mammalian genomes 2 , the number of which is rapidly increasing. However, although genome analyses may uncover protein-coding changes that potentially underlie phenotypic alterations, regulatory mutations affecting gene expression probably explain many or even most pheno- typic differences between species 3 . Until recently, comparisons of mammalian transcriptomes were essentially restricted to closely related primates 4–8 or mice 5 , although human–mouse comparisons using microarrays were also attempted 9 . Nevertheless, microarrays require hybridization to species-specific probes, making between-species comparisons of transcript abund- ance difficult 6 . The development of RNA sequencing (RNA-seq) pro- tocols now allows for accurate and sensitive assessments of expression levels 10 . The power of RNA-seq for transcriptome assessment was recently demonstrated for human individuals 11,12 and closely related primates 13,14 . RNA-seq and genome reannotation To study mammalian transcriptome evolution at high resolution, we generated RNA-seq data (,3.2 billion Illumina Genome Analyser IIx reads of 76 base pairs) for the polyadenylated RNA fraction of brain (cerebral cortex or whole brain without cerebellum), cerebellum, heart, kidney, liver and testis (usually from one male and one female per somatic tissue, and two males for testis) from nine mammalian species (Supplementary Tables 1 and 2, Methods and Supplementary Note): placental mammals (great apes, including humans; rhesus macaque; and mouse), marsupials (grey short-tailed opossum) and monotremes (platypus). Corresponding data (,0.3 billion reads) were generated for a bird (red jungle fowl, a non-domesticated chicken) and used as an evolutionary outgroup. We refined existing Ensembl 15 genome annotations by performing an initial read mapping to detect transcribed regions and splice junc- tions (Methods and Supplementary Note), which resulted in modified boundaries for ,31,000–44,500 exons and the addition of 20,000– 34,500 new exons and 66,000–125,000 new splice junctions to known protein-coding genes (Supplementary Note Tables 4 and 5). We also searched de novo for multi-exonic transcribed loci; our results vali- dated most Ensembl-annotated protein-coding genes, pseudogenes and long non-coding RNA genes (Supplementary Note Table 11), but we also detected thousands of multi-exonic transcribed loci (possibly representing protein-coding or non-coding RNA genes) in previously unannotated regions (Supplementary Note Table 10). Newly detected exons are transcribed at lower levels and are signifi- cantly less conserved, at the sequence level, than Ensembl-annotated exons (two-tailed P , 10 28 , Mann–Whitney U-test; Supplementary Fig. 1). However, the sequence conservation level is higher for new exons than for flanking introns, with visible peaks around splice sites, indicating that many of these exon sequences are preserved by puri- fying selection 16 . Depending on the species, 11–30% of the total genomic length is covered by unambiguously mapped RNA-seq reads (Table 1). Much of the covered length is explained by retained introns, but substantial coverage is also found outside annotated regions (Table 1). Our data suggest that large proportions (.34–61%) of amniote (that is, mammal and bird) genomes are transcribed, consistent with previous work 17 . *These authors contributed equally to this work. 1 Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland. 2 Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland. 3 Department of Medical Genetics, University of Lausanne, 1005 Lausanne, Switzerland. 4 Department of Integrative Biology, University of California, Berkeley, California 94720, USA. 5 Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany. 6 Chair of Systematic Zoology, Humboldt-University, 10099 Berlin, Germany. 7 CAS-MPG Partner Institute for Computational Biology, 200031 Shanghai, China. 8 The Robinson Institute, School of Molecular and Biomedical Science, University of Adelaide, Adelaide, South Australia 5005, Australia. 9 The Bioinformatics Center, University of Copenhagen, 2200 Copenhagen, Denmark. {Present address: Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA. 20 OCTOBER 2011 | VOL 478 | NATURE | 343 Macmillan Publishers Limited. All rights reserved ©2011