Origin and evolution of new exons in rodents Wen Wang, 1,9,10 Hongkun Zheng, 2,9 Shuang Yang, 1,3,9 Haijing Yu, 4,9 Jun Li, 2 Huifeng Jiang, 1,3 Jianning Su, 2 Lei Yang, 2 Jianguo Zhang, 2 Jason McDermott, 5 Ram Samudrala, 5 Jian Wang, 2 Huanming Yang, 2 Jun Yu, 2 Karsten Kristiansen, 8 Gane Ka-Shu Wong, 2,6,10 and Jun Wang 2,7,8,10 1 CAS-Max Planck Junior Research Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China; 2 Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China; 3 Graduate School of Chinese Academy Sciences, Beijing 100039, China; 4 Key Laboratory of Biodiversity Conservation and Utilization & Human Genetics Center of Yunnan University, Kunming, Yunnan 650091, China; 5 Computational Genomics Group, Department of Microbiology, University of Washington, Seattle, Washington 98195, USA; 6 UW Genome Center, Department of Medicine, University of Washington, Seattle, Washington 98195, USA; 7 The Institute of Human Genetics, University of Aarhus, DK-8000 Aarhus C, Denmark; 8 Department of Biochemistry and Molecular Biology, University of Southern Denmark, DK-5230 Odense M, Denmark Gene number difference among organisms demonstrates that new gene origination is a fundamental biological process in evolution. Exon shuffling has been universally observed in the formation of new genes. Yet to be learned are the ways new exons originate and evolve, and how often new exons appear. To address these questions, we identified 2695 newly evolved exons in the mouse and rat by comparing the expressed sequences of 12,419 orthologous genes between human and mouse, using 743,856 pig ESTs as the outgroup. The new exon origination rate is about 2.71 × 10 -3 per gene per million years. These new exons have markedly accelerated rates both of nonsynonymous substitutions and of insertions/deletions (indels). A much higher proportion of new exons have K a /K s ratios >1 (where K a is the nonsynonymous substitution rate and K s is the synonymous substitution rate) than do the old exons shared by human and mouse, implying a role of positive selection in the rapid evolution. The majority of these new exons have sequences unique in the genome, suggesting that most new exons might originate through “exonization” of intronic sequences. Most of the new exons appear to be alternative exons that are expressed at low levels. [Supplemental material is available online at www.genome.org.] Evolutionary novelties in genomes have recently attracted in- creasing attention (Lynch and Conery 2000; Prince and Pickett 2002; Long et al. 2003). Studies on young genes have afforded great insight into the mechanism of origin of new genes and their subsequent evolution. Genomic processes of new gene origination involve several fundamental mechanisms, including gene duplication, exon shuffling, retroposition, lateral gene transfer, and transposable element assimilation (Long et al. 2003). These processes sometimes create new variants of genes, but can also yield new genes with novel functions (e.g., Zhang et al. 2002, 2004). Rapid evolution is a common phenomenon in newly evolved genes, often driven by positive Darwinian selec- tion (Long and Langley 1993; Nurminsky et al. 1998; Johnson et al. 2001; Wang et al. 2002; Zhang et al. 2002). Because exon shuffling is widely recognized as important in the genera- tion of new genes (Gilbert 1978; Gilbert et al. 1997; Patthy 1999; Kaessmann et al. 2002), how new exons, the basic units of gene and exon-shuffling, originate and evolve becomes an important question at the genome level. So far, three processes have been proposed to be involved in the creation of new exons, i.e., exaptation of transposable ele- ments (Brosius and Gould 1992; Makalowski et al. 1994; Nek- rutenko and Li 2001; Sorek et al. 2002), exon duplication (Kon- drashov and Koonin 2001; Letunic et al. 2002), and exonization of intronic sequences (Gilbert 1978; Kondrashov and Koonin 2003). Makalowski et al. (1994) were the first to describe the integration of an Alu element into the coding portion of the human decay-accelerating factor (DAF) gene, and recently about 4% of human genes were found containing transposable ele- ments in their coding regions (Nekrutenko and Li 2001). Dupli- cation of existing exons has also been reported. About 10% of all genes contain tandemly duplicated exons that might confer fur- ther evolutionary potential (Letunic et al. 2002). The most easily conceived mechanism for creating new exons is exonization of intronic sequences due to easy emergence of new splicing sites through mutations. Unfortunately, up to now, only a few poten- tial examples of such a process have been identified (e.g., Kon- drashov and Koonin 2003). The majority of these pioneering reports on the origin of new exons were formulated in the context of alternative splicing (Modrek and Lee 2003; Ast 2004). Many important questions directly related to the general picture of new exon origins are still largely unanswered. For example, how often do new exons emerge? What are the subsequent evolution patterns and driving forces? Do new exons preferentially appear in particular genes? 9 These authors contributed equally to this work. 10 Corresponding authors. E-mail wwang@mail.kiz.ac.cn; fax 86-871-5193137. E-mail gksw@genomics.org.cn; fax 86-10-80498676. E-mail wangj@genomics.org.cn; fax 86-10-80498676. Article and publication are at http://www.genome.org/cgi/doi/10.1101/ gr.3929705. Article published online before print in August 2005. Letter 1258 Genome Research www.genome.org 15:1258–1264 ©2005 by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/05; www.genome.org