Origin and evolution of new exons in rodents
Wen Wang,
1,9,10
Hongkun Zheng,
2,9
Shuang Yang,
1,3,9
Haijing Yu,
4,9
Jun Li,
2
Huifeng Jiang,
1,3
Jianning Su,
2
Lei Yang,
2
Jianguo Zhang,
2
Jason McDermott,
5
Ram Samudrala,
5
Jian Wang,
2
Huanming Yang,
2
Jun Yu,
2
Karsten Kristiansen,
8
Gane Ka-Shu Wong,
2,6,10
and Jun Wang
2,7,8,10
1
CAS-Max Planck Junior Research Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology,
Chinese Academy of Sciences, Kunming, Yunnan 650223, China;
2
Beijing Institute of Genomics, Chinese Academy of Sciences,
Beijing 101300, China;
3
Graduate School of Chinese Academy Sciences, Beijing 100039, China;
4
Key Laboratory of Biodiversity
Conservation and Utilization & Human Genetics Center of Yunnan University, Kunming, Yunnan 650091, China;
5
Computational
Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington 98195, USA;
6
UW Genome
Center, Department of Medicine, University of Washington, Seattle, Washington 98195, USA;
7
The Institute of Human Genetics,
University of Aarhus, DK-8000 Aarhus C, Denmark;
8
Department of Biochemistry and Molecular Biology, University of Southern
Denmark, DK-5230 Odense M, Denmark
Gene number difference among organisms demonstrates that new gene origination is a fundamental biological
process in evolution. Exon shuffling has been universally observed in the formation of new genes. Yet to be learned
are the ways new exons originate and evolve, and how often new exons appear. To address these questions, we
identified 2695 newly evolved exons in the mouse and rat by comparing the expressed sequences of 12,419
orthologous genes between human and mouse, using 743,856 pig ESTs as the outgroup. The new exon origination
rate is about 2.71 × 10
-3
per gene per million years. These new exons have markedly accelerated rates both of
nonsynonymous substitutions and of insertions/deletions (indels). A much higher proportion of new exons have
K
a
/K
s
ratios >1 (where K
a
is the nonsynonymous substitution rate and K
s
is the synonymous substitution rate) than
do the old exons shared by human and mouse, implying a role of positive selection in the rapid evolution. The
majority of these new exons have sequences unique in the genome, suggesting that most new exons might originate
through “exonization” of intronic sequences. Most of the new exons appear to be alternative exons that are
expressed at low levels.
[Supplemental material is available online at www.genome.org.]
Evolutionary novelties in genomes have recently attracted in-
creasing attention (Lynch and Conery 2000; Prince and Pickett
2002; Long et al. 2003). Studies on young genes have afforded
great insight into the mechanism of origin of new genes and
their subsequent evolution. Genomic processes of new gene
origination involve several fundamental mechanisms, including
gene duplication, exon shuffling, retroposition, lateral gene
transfer, and transposable element assimilation (Long et al.
2003). These processes sometimes create new variants of genes,
but can also yield new genes with novel functions (e.g., Zhang et
al. 2002, 2004). Rapid evolution is a common phenomenon in
newly evolved genes, often driven by positive Darwinian selec-
tion (Long and Langley 1993; Nurminsky et al. 1998; Johnson et
al. 2001; Wang et al. 2002; Zhang et al. 2002). Because
exon shuffling is widely recognized as important in the genera-
tion of new genes (Gilbert 1978; Gilbert et al. 1997; Patthy 1999;
Kaessmann et al. 2002), how new exons, the basic units of gene
and exon-shuffling, originate and evolve becomes an important
question at the genome level.
So far, three processes have been proposed to be involved in
the creation of new exons, i.e., exaptation of transposable ele-
ments (Brosius and Gould 1992; Makalowski et al. 1994; Nek-
rutenko and Li 2001; Sorek et al. 2002), exon duplication (Kon-
drashov and Koonin 2001; Letunic et al. 2002), and exonization
of intronic sequences (Gilbert 1978; Kondrashov and Koonin
2003). Makalowski et al. (1994) were the first to describe the
integration of an Alu element into the coding portion of the
human decay-accelerating factor (DAF) gene, and recently about
4% of human genes were found containing transposable ele-
ments in their coding regions (Nekrutenko and Li 2001). Dupli-
cation of existing exons has also been reported. About 10% of all
genes contain tandemly duplicated exons that might confer fur-
ther evolutionary potential (Letunic et al. 2002). The most easily
conceived mechanism for creating new exons is exonization of
intronic sequences due to easy emergence of new splicing sites
through mutations. Unfortunately, up to now, only a few poten-
tial examples of such a process have been identified (e.g., Kon-
drashov and Koonin 2003).
The majority of these pioneering reports on the origin of
new exons were formulated in the context of alternative splicing
(Modrek and Lee 2003; Ast 2004). Many important questions
directly related to the general picture of new exon origins are still
largely unanswered. For example, how often do new exons
emerge? What are the subsequent evolution patterns and driving
forces? Do new exons preferentially appear in particular genes?
9
These authors contributed equally to this work.
10
Corresponding authors.
E-mail wwang@mail.kiz.ac.cn; fax 86-871-5193137.
E-mail gksw@genomics.org.cn; fax 86-10-80498676.
E-mail wangj@genomics.org.cn; fax 86-10-80498676.
Article and publication are at http://www.genome.org/cgi/doi/10.1101/
gr.3929705. Article published online before print in August 2005.
Letter
1258 Genome Research
www.genome.org
15:1258–1264 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05; www.genome.org