ARTICLE
doi:10.1038/nature10532
The evolution of gene expression levels in
mammalian organs
David Brawand
1,2
*, Magali Soumillon
1,2
*, Anamaria Necsulea
1,2
*, Philippe Julien
1,2
, Ga ´bor Csa ´rdi
2,3
, Patrick Harrigan
4
,
Manuela Weier
1
, Ange ´lica Liechti
1
, Ayinuer Aximu-Petri
5
, Martin Kircher
5
, Frank W. Albert
5
{, Ulrich Zeller
6
, Philipp Khaitovich
7
,
Frank Gru ¨tzner
8
, Sven Bergmann
2,3
, Rasmus Nielsen
4,9
,SvantePa¨a ¨bo
5
& Henrik Kaessmann
1,2
Changes in gene expression are thought to underlie many of the phenotypic differences between species. However,
large-scale analyses of gene expression evolution were until recently prevented by technological limitations. Here we
report the sequencing of polyadenylated RNA from six organs across ten species that represent all major mammalian
lineages (placentals, marsupials and monotremes) and birds (the evolutionary outgroup), with the goal of understanding
the dynamics of mammalian transcriptome evolution. We show that the rate of gene expression evolution varies among
organs, lineages and chromosomes, owing to differences in selective pressures: transcriptome change was slow in
nervous tissues and rapid in testes, slower in rodents than in apes and monotremes, and rapid for the X chromosome
right after its formation. Although gene expression evolution in mammals was strongly shaped by purifying selection, we
identify numerous potentially selectively driven expression switches, which occurred at different rates across lineages
and tissues and which probably contributed to the specific organ biology of various mammals.
Shared mammalian traits include lactation, hair and relatively large
brains with unique structures
1
. In addition to these traits, individual
lineages have evolved distinct anatomical, physiological and beha-
vioural characteristics relating to differences in reproduction, life span,
cognitive abilities and disease susceptibility. The molecular changes
underlying these phenotypic shifts and the associated selective pres-
sures have begun to be investigated using available mammalian
genomes
2
, the number of which is rapidly increasing. However,
although genome analyses may uncover protein-coding changes that
potentially underlie phenotypic alterations, regulatory mutations
affecting gene expression probably explain many or even most pheno-
typic differences between species
3
.
Until recently, comparisons of mammalian transcriptomes were
essentially restricted to closely related primates
4–8
or mice
5
, although
human–mouse comparisons using microarrays were also attempted
9
.
Nevertheless, microarrays require hybridization to species-specific
probes, making between-species comparisons of transcript abund-
ance difficult
6
. The development of RNA sequencing (RNA-seq) pro-
tocols now allows for accurate and sensitive assessments of expression
levels
10
. The power of RNA-seq for transcriptome assessment was
recently demonstrated for human individuals
11,12
and closely related
primates
13,14
.
RNA-seq and genome reannotation
To study mammalian transcriptome evolution at high resolution, we
generated RNA-seq data (,3.2 billion Illumina Genome Analyser IIx
reads of 76 base pairs) for the polyadenylated RNA fraction of brain
(cerebral cortex or whole brain without cerebellum), cerebellum,
heart, kidney, liver and testis (usually from one male and one female
per somatic tissue, and two males for testis) from nine mammalian
species (Supplementary Tables 1 and 2, Methods and Supplementary
Note): placental mammals (great apes, including humans; rhesus
macaque; and mouse), marsupials (grey short-tailed opossum) and
monotremes (platypus). Corresponding data (,0.3 billion reads)
were generated for a bird (red jungle fowl, a non-domesticated
chicken) and used as an evolutionary outgroup.
We refined existing Ensembl
15
genome annotations by performing
an initial read mapping to detect transcribed regions and splice junc-
tions (Methods and Supplementary Note), which resulted in modified
boundaries for ,31,000–44,500 exons and the addition of 20,000–
34,500 new exons and 66,000–125,000 new splice junctions to known
protein-coding genes (Supplementary Note Tables 4 and 5). We also
searched de novo for multi-exonic transcribed loci; our results vali-
dated most Ensembl-annotated protein-coding genes, pseudogenes
and long non-coding RNA genes (Supplementary Note Table 11),
but we also detected thousands of multi-exonic transcribed loci
(possibly representing protein-coding or non-coding RNA genes) in
previously unannotated regions (Supplementary Note Table 10).
Newly detected exons are transcribed at lower levels and are signifi-
cantly less conserved, at the sequence level, than Ensembl-annotated
exons (two-tailed P , 10
28
, Mann–Whitney U-test; Supplementary
Fig. 1). However, the sequence conservation level is higher for new
exons than for flanking introns, with visible peaks around splice sites,
indicating that many of these exon sequences are preserved by puri-
fying selection
16
.
Depending on the species, 11–30% of the total genomic length is
covered by unambiguously mapped RNA-seq reads (Table 1). Much
of the covered length is explained by retained introns, but substantial
coverage is also found outside annotated regions (Table 1). Our data
suggest that large proportions (.34–61%) of amniote (that is,
mammal and bird) genomes are transcribed, consistent with previous
work
17
.
*These authors contributed equally to this work.
1
Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.
2
Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.
3
Department of Medical Genetics, University of
Lausanne, 1005 Lausanne, Switzerland.
4
Department of Integrative Biology, University of California, Berkeley, California 94720, USA.
5
Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig,
Germany.
6
Chair of Systematic Zoology, Humboldt-University, 10099 Berlin, Germany.
7
CAS-MPG Partner Institute for Computational Biology, 200031 Shanghai, China.
8
The Robinson Institute, School of
Molecular and Biomedical Science, University of Adelaide, Adelaide, South Australia 5005, Australia.
9
The Bioinformatics Center, University of Copenhagen, 2200 Copenhagen, Denmark. {Present address:
Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA.
20 OCTOBER 2011 | VOL 478 | NATURE | 343
Macmillan Publishers Limited. All rights reserved ©2011