Although there is increasing evidence that
eukaryotic gene order is not always random,
there is no evidence that putatively
favourable gene arrangements are preserved
by selection more than expected by chance.
In yeast (Saccharomyces cerevisiae), for
example,co-expressed genes tend to be
linked, but whether such gene pairs tend to
remain linked more often than expected
under null neutral expectations is not
know n. We show using gene pairs in the
S. cerevisiae– Candida albicans comparison
that highly co-expressed gene pairs are
conserved as pairs at about twice the
average rate. However, co-expressed genes
also tend to be in close physical proximity
and, as expected from a null neutral model,
genes (be they co-expressed or not) that are
physically close together tend to be retained
more often. This physical proximity,
however, only accounts for a small
proportion of the enhanced degree of
conservation of co-expressed gene pairs.
These results demonstrate that purely
neutralist models of gene order evolution
are not realistic.
Published online: 01 November 2002
Much current data suggests that the
‘randomly arranged beans on a string’
model of eukaryotic genomes is not
adequate. Not only are certain sorts of
genes especially prevalent on the
X chromosome [1,2], but in humans [3],
flies [4], yeast [5] and worm [6] genes of
similar expression profile tend to be
clustered. In striking contrast, there is
very little evidence to suggest that any
putatively adaptive clusters remain
conserved more often than expected of
any random set of genes, with obvious
exceptions such as Hox clusters [7]. Based
on a limited sample it has, however, been
suggested that co-expressed genes in
yeast might be conserved at a higher rate
than expected [8], although a broad-scale
analysis failed to show that gene
orientation (a putative covariate of
co-expression), was biased in conserved
gene pairs [9].
To address this issue we assembled
a dataset of S. cerevisiae gene pairs
(i.e. nearest neighbours) for which we
could define the orthologue for both genes
in Candida (1850 pairs). Orthology was
determined using reciprocal best hits in
BLAST analysis, as previously described
[10]. Chromosomal location in yeast
was derived from accession numbers
NC_001133-48. Protein sequence and
location data for C. albicans was obtained
from the Stanford DNA Sequencing and
Technology Center website at http://www-
sequence.stanford.edu/group/candida/
index.html; contig version 6.
Of the 1850 yeast gene pairs with
Candida orthologues, we eliminated those
that were pairs of tandem duplicates (as
defined by pairwise BLAST score E < 10
-2
),
those that were overlapping or with no
space between the genes and those for
which we could not define the extent of
co-expression between neighbouring genes.
This left a total of 1817 gene pairs. The
dataset includes 166 pairs in yeast that
remain as nearest neighbours in Candida.
These we consider to be the gene pairs with
conserved linkage. The overall proportion
conserved (9%) is low, but this is more a
measure of the long time since common
ancestry (~200 million years) than an
indication of the presence or absence of
selection. Indeed, comparison can be made
with the evolution of codon usage bias:
in yeast, highly expressed genes show
strong codon usage bias indicative of
selection acting on ‘silent’ point mutations,
but in comparisons of these genes with
orthologues in Candida, the silent site
substitution rate is very high and so close to
saturation as to be all but uninterpretable.
To establish whether co-expression is
important for retention of linkage, we need
to define the extent of co-expression.
We took the expression profiles from the
microarray data compiled by the Eisen lab
(http://rana.lbl.gov/EisenData.htm) and,
using normalized data, for each linked
pair calculated the Pearson correlation
coefficient (r) between the two genes, a
measure of their degree of co-expression.
If co-expression were important in the
retention of a gene pair, then we would
expect that as the degree of co-expression
goes up, so would the probability of
conservation of linkage. However, we have
no reason to suppose that this is necessarily
a gradual effect. For most gene pairs, the
r values simply represent random noise:
a small positive value for r should not be
taken as evidence of more co-ordinated
expression than an equally small negative
value. Only when the r value is especially
high do we suspect some functionally
significantly co-ordination in the
regulation of the two genes.
Therefore, to provide an indication of
whether co-expression is important, we
performed a sliding-window analysis of
gene pairs organized by the ranked
r value, calculating mean r, the proportion
conserved and mean intergene spacer.
As can be seen in Fig. 1, at high values of
mean r (highly co-expressed genes), the
proportion conserved does indeed greatly
exceed null expectations. This provides
the first whole-genome analysis to
indicate that co-expressed genes are
conserved more than expected by chance.
As expected then, the genes pairs that are
conserved have higher r values (i.e. are
more likely to be co-expressed) than those
that are not conserved (Mann–Whitney
U test, P = 0.01).
There is, nonetheless, a difficulty with
the interpretation of the above result.
Examination of Fig. 1 also indicates that
as the mean r increases, the mean
intergenic distance decreases. The excess
conservation of co-expressed gene pairs
might then trivially be explained as a
consequence of a null neutral evolution of
gene order. The simplest explanation for
the conservation of linkage is that gene
order re-arrangements (e.g. inversions)
occur at random locations, that they are
tolerated only if they disrupt intergene
spacer and that all such tolerated
re-arrangements are without selective
consequences. The tolerated ones then
could spread by drift (i.e. neutral
evolution). Gene pairs with small
intergene spacer should then be expected
to be conserved as nearest neighbours
more often. Indeed, as predicted, the genes
in conserved pairs are closer together than
those in the non-conserved pairs (mean
intergene spacer of unconserved
TRENDS in Genetics Vol.18 No.12 December 2002
http://tig.trends.com 0168-9525/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)02813-5
604 Research Update
Natural selection promotes the conservation of linkage
of co-expressed genes
Laurence D. Hurst, Elizabeth J. B. Williams and Csaba Pál