Do disparate mechanisms of duplication add similar genes to the genome? Jerel C. Davis and Dmitri A. Petrov Department of Biological Sciences, Stanford University, 371 Serra Mall, Stanford, CA 90305-5020, USA Gene duplication is the fundamental source of new genes. Biases in duplication have profound implications for the dynamics of gene content during evolution. In this article, we compare genes arising from whole gene duplication (WGD), smaller scale duplication (SSD) and singletons in Saccharomyces cerevisiae. Our results demonstrate that genes duplicated by WGD and SSD are similarly biased with respect to codon bias and evolutionary rate, although differing significantly in their functional constituency. Introduction Gene duplication is the major source of new genes [1] and consequently is a central force affecting genome evolution [2]. Duplications are known to occur on two fundamental scales: whole genome duplication (WGD) and smaller scale duplication (SSD). WGD has been important in the evolutionary history of several animal and plant lineages [1,3–9] and SSDs, involving one or several genes, occur continuously by several mechanisms [2,10,11]. It is known that certain types of genes are more likely to lead to persistent duplicates than others [12–16]. It is unknown, however, whether both WGD and SSD lead to similar compliments of persistent duplicate genes. New duplicates must pass through several sieving stages to become a persistent part of the genome and these sieving stages are somewhat different for WGD and SSD (Box 1). Therefore, these two modes of duplication might influence gene content in distinct ways. In this article, we investigated whether both SSD and WGD in Saccharomyces cerevisiae (WGD occurred w100 million years ago (Mya) [4,17]) led to similar compliments of persistent duplicate genes by investi- gating their functional classification, codon bias and rate of evolution before duplication. We found that although both WGD and SSD sets have a greater codon bias and arise from more slowly evolving genes than those that remained as single copies, the two duplicate sets are enriched for different functional classes of genes. Identification of genes in the three classes We identified 2126 singleton genes, 356 WGD duplicates and 626 SSD duplicates in S. cerevisiae as described in supplementary material online. For simplicity, we limited both WGD and SSD sets to consist only of duplicate genes with no other paralogs in the genome. The average Box 1. The process leading to long-term duplicate survival The WGD and SSD duplication processes differ in several respects (Figure I). First, in SSD a duplicate gene must undergo an independent mutational event. Second, a duplicated gene must start from being present in one individual in the population to becoming present in the entire population (fixation) – passing through this stage depends largely on whether the duplication is selectively advantageous, deleterious or neutral. Most genes never survive this step [26]. Finally, duplicate genes which are functionally redundant with the ancestral copy must diverge, so they do not completely overlap in function (e.g. Refs [27–29]), for both duplicate copies to persist. Many duplicate genes are never preserved and become quickly silenced over the course of a few million years [19]. Although WGD shares the preservation step, the first two steps of the process differ. First, genes arising by WGD do not duplicate independently but duplicate with the rest of the genome. Second, the genes do not need to fix in the population but must instead survive a period of rapid genome rearrangement and gene loss [4,9,30]. Because these two processes differ, the complements of genes arising from each mechanism can also vary. TRENDS in Genetics Fixation Preservation Preservation Period of rapid genome rearrangement and gene loss Small scale duplication Whole genome duplication Mutational generation WGD event Figure I. An illustration of the steps leading to the generation and long-term persistence of duplicate genes. The blue colour identifies those steps that can potentially bias the set of eventual duplicate genes. The only step shared between the two processes is duplicate gene preservation. Corresponding author: Davis, J.C. ( jerel@stanford.edu). Available online 11 August 2005 Update TRENDS in Genetics Vol.21 No.10 October 2005 548 www.sciencedirect.com