13 Overy, S.A. et al. (2005) Application of metabolite profiling to the identification of traits in a population of tomato introgression lines. J. Exp. Bot. 56, 287–296 14 Weckwerth, W. et al. (2004) Differential metabolic networks unravel the effects of silent plant phenotypes. Proc. Natl. Acad. Sci. U. S. A. 101, 7809–7814 15 Kemsley, E.K. (1998) Discriminant Analysis and Class Modelling of Spectroscopic Data. J. Wiley 0168-9525/$ – see front matter ß 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2006.08.002 Genome Analysis In plants, highly expressed genes are the least compact Xin-Ying Ren 1 , Oscar Vorst 1 , Mark W.E.J. Fiers 1 , Willem J. Stiekema 2 and Jan-Peter Nap 1, 2 1 Applied Bioinformatics, Plant Research International, Wageningen University and Research Centre, 6708 PB Wageningen, The Netherlands 2 Centre for BioSystems Genomics, 6700 AA Wageningen, The Netherlands In both the monocot rice and the dicot Arabidopsis, highly expressed genes have more and longer introns and a larger primary transcript than genes expressed at a low level: higher expressed genes tend to be less compact than lower expressed genes. In animal genomes, it is the other way round. Although the length differences in plant genes are much smaller than in animals, these findings indicate that plant genes are in this respect different from animal genes. Explanations for the relationship between gene configuration and gene expression in animals might be (or might have been) less important in plants. We speculate that selection, if any, on genome onfiguration has taken a different turn after the divergence of plants and animals. Introduction A major issue in relating genome structure to gene expression is the relationship between the relative activity of genes and their position and/or structure. In organisms as diverse as human [1–4] and Caenorhabditis elegans [1], highly expressed genes have fewer and shorter introns, shorter coding sequences and shorter intergenic regions [1–5]. This compact nature of highly expressed genes is explained by a selection for either transcriptional efficiency to reduce time and energy [1], a regional mutation bias that positions highly expressed genes in domains more prone to deletions [3] or by a genomic design into open chromatin [4]. We here present a whole genome analysis of the relationship between gene structure and gene expression for two widely diverged plant species, the monocotyledo- nous plant rice (Oryza sativa) and the dicotyledonous plant Arabidopsis thaliana, with data from two different expres- sion platforms, massively parallel sequencing signature (MPSS) and microarrays. In both plant genomes, highly expressed genes have more and longer introns and a longer primary transcript. In short, they are less compact than the genes expressed at a low level. This contrasts with the relationship between gene expression and gene structure in human and C. elegans, although the absolute differences between plant genes are considerably smaller than for human genes. These findings could suggest that the out- come of selection has been different between animals and plants. Analysis of plant gene expression in relationship to gene structure The public domain MPSS expression data for Arabidopsis [6] (http://mpss.udel.edu/at/) and rice [7] (http://mpss.udel. edu/rice/) offer good genome-wide expression coverage in a range of different expression libraries and allow easy quantification. To correlate expression data with gene structure, we obtained Arabidopsis and rice genome sequences and annotations from The Institute of Genomic Research (TIGR). All genes annotated as either (retro)- transposons or pseudogenes were excluded from the ana- lysis and, in cases of alternative splicing, the longest variant was used in the analyses. We mapped the MPSS expression data to their position in the Arabidopsis (TIGR5) and rice (TIGR version 3) genome and all 17 base MPSS tags with a unique position were taken into account. Genes without expression data were not included in the analysis. To compare the levels of expression of genes in different expression libraries, we sorted the expression values in each library in an ascending order, then divided them into five groups, each containing 20% of the population, and assigned an expression rank from 1 (low expression) to 5 (high expression). Where the cutoff caused equal expres- sion values to be in different rank groups (happening notably with zero expression), the expression values were placed in the lower rank group. For each gene, we averaged the expression ranks over all libraries. This averaged expression rank (rE) indicates the relative expression level of each gene under all conditions analyzed. Alternative methods of expression analysis (see the supplementary material online) give similar results as found for rE. As the rE can be influenced in part by the number of libraries Corresponding author: Nap, J.-P. (janpeter.nap@wur.nl) Available online 24 August 2006. 528 Update TRENDS in Genetics Vol.22 No.10 www.sciencedirect.com