TRENDS in Genetics Vol.18 No.12 December 2002 http://tig.trends.com 0168-9525/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)02837-8 609 Research Update For more than 30 years, expression divergence has been considered as a major reason for retaining duplicated genes in a genome, but how often and how fast duplicate genes diverge in expression has not been studied at the genomic level. Using yeast microarray data, we show that expression divergence between duplicate genes is significantly correlated with their synonymous divergence (K S ) and also w ith their nonsynonymous divergence (K A ) if K A 0.3. Thus, expression divergence increases w ith evolutionary time, and K A is initially coupled with expression divergence. More interestingly, a large proportion of duplicate genes have diverged quickly in expression and the vast majority of gene pairs eventually become divergent in expression. Indeed, more than 40% of gene pairs show expression divergence even when K S is 0.10, and this proportion becomes >80% for K S > 1.5. Only a small fraction of ancient gene pairs do not show expression divergence. Published online: 01 November 2002 Expression divergence between duplicate genes has long been a subject of great interest to geneticists and evolutionists [1–4]. Indeed, Ohno [2] and others [3,4] had proposed expression divergence as the first step towards the retention of duplicate genes. In the past, however, studies of expression divergence were usually conducted for a limited number of gene families, providing no general picture of the rate of expression divergence between duplicate genes in a genome. Fortunately, a general picture can now be seen thanks to the advent of microarray gene expression technology (Box 1) and the complete sequences of many genomes. Indeed, using the microarray technology, Ferea et al. [5] showed that rapid change in gene expression can occur in experimental lineages of yeast. These advances notwithstanding, there remains the difficulty of dating the divergence time between two duplicate genes, which is needed for inferring the rate of expression divergence. In a pioneering study using microarray data from Saccharomyces cerevisiae, Wagner [6] found no significant correlation (-0.30, P = 0.18) between expression divergence and protein sequence divergence (d) between duplicate genes, and concluded that expression divergence and sequence divergence are decoupled. This result, however, does not imply that expression divergence and evolutionary time are decoupled because d might not be a good proxy of divergence time. Because the rate of amino acid substitution varies tremendously among proteins [7,8], no single d value can be applied to date the divergence times of different protein or gene pairs. By comparison, the rate of synonymous substitution is more uniform among genes [7,8], and so K S is a better proxy of divergence time. We shall therefore rely more on K S than d. To avoid using correlated data points, we selected independent pairs of duplicate genes in the yeast genome (Box 2). For each gene family, we started with the pair with the smallest K S and continued selecting pairs with increasing K S , because gene pairs with a small K S are fewer than those with a large K S and because a smaller K S can more accurately reflect the time course of expression divergence. Moreover, we selected gene pairs where neither duplicate shows strong codon usage bias, because this bias can retard the increase of K S so as to make K S a poor proxy of divergence time. Then we analysed the expression divergence for each gene pair using expression data from microarray analyses (see Box 2). Figure 1a shows a significant negative correlation (-0.47, P < 2 ×10 -5 ) between ln[(1+R)/(1-R)] and K S . We used the transformation ln[(1+R)/(1-R)] instead of R to change the scale to a more appropriate one for a linear regression analysis (Box 2); actually, a similar correlation (-0.54) is obtained between R and K S . A stronger correlation than this is not expected because K S is only a crude Rapid divergence in expression between duplicate genes inferred from microarray data Zhenglong Gu, Dan Nicolae, Henry H-S. Lu and Wen-Hsiung Li A total of 208 cDNA microarray experiment data points were compiled for this study. The dataset represents the gene expression under various developmental and physiological conditions in the yeast life history (Table I). For some processes, more than one yeast strain or one time course were studied and we randomly selected only one of them for each process. Log 2 -transformed ratios of gene expression in experimental populations to reference populations were used in the analysis. References a Chu, S. et al. (1998) The transcriptional program of sporulation in budding yeast. Science 282, 699–705 b Spellman, P.T. et al. (1998) Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 c Lyons, T.J. et al. (2000) Genome-wide characterization of the Zap1p zinc-responsive regulon in yeast. Proc. Natl. Acad. Sci. U. S. A. 97, 7957–7962 d Gasch, A.P. et al. (2000) Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11, 4241–4257 e DeRisi, J.L. et al. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 Box 1. Yeast microarray data Table I. Studied processes and number of data points in each process Process Data points Ref. Sporulation 9 [a] Cell cycle 17 [b] Zinc regulation 9 [c] YPD growth 10 [d] Diamide treatment 8 [d] Nitrogen deletion 10 [d] DTT treatment 8 [d] H2O2 treatment 10 [d] Menadione treatment 9 [d] Diauxic shift 7 [e] Heat shock 7 [d] Hyper-osmotic shock 7 [d] Different carbon resources 6 [d] Amino acid starvation 5 [d] Other experiments in response 86 [d] to environmental changes