Although there is increasing evidence that eukaryotic gene order is not always random, there is no evidence that putatively favourable gene arrangements are preserved by selection more than expected by chance. In yeast (Saccharomyces cerevisiae), for example,co-expressed genes tend to be linked, but whether such gene pairs tend to remain linked more often than expected under null neutral expectations is not know n. We show using gene pairs in the S. cerevisiaeCandida albicans comparison that highly co-expressed gene pairs are conserved as pairs at about twice the average rate. However, co-expressed genes also tend to be in close physical proximity and, as expected from a null neutral model, genes (be they co-expressed or not) that are physically close together tend to be retained more often. This physical proximity, however, only accounts for a small proportion of the enhanced degree of conservation of co-expressed gene pairs. These results demonstrate that purely neutralist models of gene order evolution are not realistic. Published online: 01 November 2002 Much current data suggests that the ‘randomly arranged beans on a string’ model of eukaryotic genomes is not adequate. Not only are certain sorts of genes especially prevalent on the X chromosome [1,2], but in humans [3], flies [4], yeast [5] and worm [6] genes of similar expression profile tend to be clustered. In striking contrast, there is very little evidence to suggest that any putatively adaptive clusters remain conserved more often than expected of any random set of genes, with obvious exceptions such as Hox clusters [7]. Based on a limited sample it has, however, been suggested that co-expressed genes in yeast might be conserved at a higher rate than expected [8], although a broad-scale analysis failed to show that gene orientation (a putative covariate of co-expression), was biased in conserved gene pairs [9]. To address this issue we assembled a dataset of S. cerevisiae gene pairs (i.e. nearest neighbours) for which we could define the orthologue for both genes in Candida (1850 pairs). Orthology was determined using reciprocal best hits in BLAST analysis, as previously described [10]. Chromosomal location in yeast was derived from accession numbers NC_001133-48. Protein sequence and location data for C. albicans was obtained from the Stanford DNA Sequencing and Technology Center website at http://www- sequence.stanford.edu/group/candida/ index.html; contig version 6. Of the 1850 yeast gene pairs with Candida orthologues, we eliminated those that were pairs of tandem duplicates (as defined by pairwise BLAST score E < 10 -2 ), those that were overlapping or with no space between the genes and those for which we could not define the extent of co-expression between neighbouring genes. This left a total of 1817 gene pairs. The dataset includes 166 pairs in yeast that remain as nearest neighbours in Candida. These we consider to be the gene pairs with conserved linkage. The overall proportion conserved (9%) is low, but this is more a measure of the long time since common ancestry (~200 million years) than an indication of the presence or absence of selection. Indeed, comparison can be made with the evolution of codon usage bias: in yeast, highly expressed genes show strong codon usage bias indicative of selection acting on ‘silent’ point mutations, but in comparisons of these genes with orthologues in Candida, the silent site substitution rate is very high and so close to saturation as to be all but uninterpretable. To establish whether co-expression is important for retention of linkage, we need to define the extent of co-expression. We took the expression profiles from the microarray data compiled by the Eisen lab (http://rana.lbl.gov/EisenData.htm) and, using normalized data, for each linked pair calculated the Pearson correlation coefficient (r) between the two genes, a measure of their degree of co-expression. If co-expression were important in the retention of a gene pair, then we would expect that as the degree of co-expression goes up, so would the probability of conservation of linkage. However, we have no reason to suppose that this is necessarily a gradual effect. For most gene pairs, the r values simply represent random noise: a small positive value for r should not be taken as evidence of more co-ordinated expression than an equally small negative value. Only when the r value is especially high do we suspect some functionally significantly co-ordination in the regulation of the two genes. Therefore, to provide an indication of whether co-expression is important, we performed a sliding-window analysis of gene pairs organized by the ranked r value, calculating mean r, the proportion conserved and mean intergene spacer. As can be seen in Fig. 1, at high values of mean r (highly co-expressed genes), the proportion conserved does indeed greatly exceed null expectations. This provides the first whole-genome analysis to indicate that co-expressed genes are conserved more than expected by chance. As expected then, the genes pairs that are conserved have higher r values (i.e. are more likely to be co-expressed) than those that are not conserved (Mann–Whitney U test, P = 0.01). There is, nonetheless, a difficulty with the interpretation of the above result. Examination of Fig. 1 also indicates that as the mean r increases, the mean intergenic distance decreases. The excess conservation of co-expressed gene pairs might then trivially be explained as a consequence of a null neutral evolution of gene order. The simplest explanation for the conservation of linkage is that gene order re-arrangements (e.g. inversions) occur at random locations, that they are tolerated only if they disrupt intergene spacer and that all such tolerated re-arrangements are without selective consequences. The tolerated ones then could spread by drift (i.e. neutral evolution). Gene pairs with small intergene spacer should then be expected to be conserved as nearest neighbours more often. Indeed, as predicted, the genes in conserved pairs are closer together than those in the non-conserved pairs (mean intergene spacer of unconserved TRENDS in Genetics Vol.18 No.12 December 2002 http://tig.trends.com 0168-9525/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)02813-5 604 Research Update Natural selection promotes the conservation of linkage of co-expressed genes Laurence D. Hurst, Elizabeth J. B. Williams and Csaba Pál