TRENDS in Genetics Vol.18 No.12 December 2002
http://tig.trends.com 0168-9525/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)02837-8
609 Research Update
For more than 30 years, expression
divergence has been considered as a major
reason for retaining duplicated genes in a
genome, but how often and how fast
duplicate genes diverge in expression has
not been studied at the genomic level.
Using yeast microarray data, we show that
expression divergence between duplicate
genes is significantly correlated with their
synonymous divergence (K
S
) and also w ith
their nonsynonymous divergence (K
A
)
if K
A
≤ 0.3. Thus, expression divergence
increases w ith evolutionary time, and
K
A
is initially coupled with expression
divergence. More interestingly, a large
proportion of duplicate genes have
diverged quickly in expression and the vast
majority of gene pairs eventually become
divergent in expression. Indeed, more than
40% of gene pairs show expression
divergence even when K
S
is ≤ 0.10, and
this proportion becomes >80% for K
S
> 1.5.
Only a small fraction of ancient gene pairs
do not show expression divergence.
Published online: 01 November 2002
Expression divergence between duplicate
genes has long been a subject of great
interest to geneticists and evolutionists
[1–4]. Indeed, Ohno [2] and others [3,4]
had proposed expression divergence as
the first step towards the retention of
duplicate genes. In the past, however,
studies of expression divergence were
usually conducted for a limited number
of gene families, providing no general
picture of the rate of expression
divergence between duplicate genes in a
genome. Fortunately, a general picture
can now be seen thanks to the advent of
microarray gene expression technology
(Box 1) and the complete sequences of
many genomes. Indeed, using the
microarray technology, Ferea et al. [5]
showed that rapid change in gene
expression can occur in experimental
lineages of yeast.
These advances notwithstanding,
there remains the difficulty of dating the
divergence time between two duplicate
genes, which is needed for inferring the
rate of expression divergence. In a
pioneering study using microarray data
from Saccharomyces cerevisiae, Wagner [6]
found no significant correlation
(-0.30, P = 0.18) between expression
divergence and protein sequence
divergence (d) between duplicate genes,
and concluded that expression divergence
and sequence divergence are decoupled.
This result, however, does not imply that
expression divergence and evolutionary
time are decoupled because d might not
be a good proxy of divergence time.
Because the rate of amino acid
substitution varies tremendously among
proteins [7,8], no single d value can be
applied to date the divergence times of
different protein or gene pairs. By
comparison, the rate of synonymous
substitution is more uniform among
genes [7,8], and so K
S
is a better proxy of
divergence time. We shall therefore rely
more on K
S
than d.
To avoid using correlated data points,
we selected independent pairs of duplicate
genes in the yeast genome (Box 2).
For each gene family, we started with the
pair with the smallest K
S
and continued
selecting pairs with increasing K
S
,
because gene pairs with a small K
S
are
fewer than those with a large K
S
and
because a smaller K
S
can more accurately
reflect the time course of expression
divergence. Moreover, we selected gene
pairs where neither duplicate shows
strong codon usage bias, because this bias
can retard the increase of K
S
so as to make
K
S
a poor proxy of divergence time. Then
we analysed the expression divergence for
each gene pair using expression data from
microarray analyses (see Box 2).
Figure 1a shows a significant negative
correlation (-0.47, P < 2 ×10
-5
) between
ln[(1+R)/(1-R)] and K
S
. We used the
transformation ln[(1+R)/(1-R)] instead
of R to change the scale to a more
appropriate one for a linear regression
analysis (Box 2); actually, a similar
correlation (-0.54) is obtained between
R and K
S
. A stronger correlation than this
is not expected because K
S
is only a crude
Rapid divergence in expression between duplicate genes
inferred from microarray data
Zhenglong Gu, Dan Nicolae, Henry H-S. Lu and Wen-Hsiung Li
A total of 208 cDNA microarray experiment
data points were compiled for this study. The
dataset represents the gene expression under
various developmental and physiological
conditions in the yeast life history (Table I).
For some processes, more than one yeast
strain or one time course were studied and
we randomly selected only one of them for
each process. Log
2
-transformed ratios of
gene expression in experimental populations
to reference populations were used in
the analysis.
References
a Chu, S. et al. (1998) The transcriptional
program of sporulation in budding yeast.
Science 282, 699–705
b Spellman, P.T. et al. (1998) Comprehensive
identification of cell cycle-regulated genes of
the yeast Saccharomyces cerevisiae by
microarray hybridization. Mol. Biol. Cell
9, 3273–3297
c Lyons, T.J. et al. (2000) Genome-wide
characterization of the Zap1p zinc-responsive
regulon in yeast. Proc. Natl. Acad. Sci. U. S. A.
97, 7957–7962
d Gasch, A.P. et al. (2000) Genomic expression
programs in the response of yeast cells to
environmental changes. Mol. Biol. Cell
11, 4241–4257
e DeRisi, J.L. et al. (1997) Exploring the
metabolic and genetic control of gene
expression on a genomic scale. Science
278, 680–686
Box 1. Yeast microarray data
Table I. Studied processes and number
of data points in each process
Process Data
points
Ref.
Sporulation 9 [a]
Cell cycle 17 [b]
Zinc regulation 9 [c]
YPD growth 10 [d]
Diamide treatment 8 [d]
Nitrogen deletion 10 [d]
DTT treatment 8 [d]
H2O2 treatment 10 [d]
Menadione treatment 9 [d]
Diauxic shift 7 [e]
Heat shock 7 [d]
Hyper-osmotic shock 7 [d]
Different carbon resources 6 [d]
Amino acid starvation 5 [d]
Other experiments in response 86 [d]
to environmental changes