2 Kan, Z. et al. (2002) Selecting for functional alternative splices in ESTs. Genome Res. 12, 1837–1845 3 Modrek, B. and Lee, C.J. (2003) Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat. Genet. 34, 177–180 4 Nurtdinov, R.N. et al. (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Hum. Mol. Genet. 12, 1313–1320 5 Li, W.H. et al. (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2, 150–174 6 Maquat, L.E. (2004) Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat. Rev. Mol. Cell Biol. 5, 89–99 7 Nagy, E. and Maquat, L.E. (1998) A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198–199 8 Maquat, L.E. (2002) Nonsense-mediated mRNA decay. Curr. Biol. 12, R196–R197 9 Lewis, B.P. et al. (2003) Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proc. Natl. Acad. Sci. U. S. A. 100, 189–192 10 Xing, Y. et al. (2004) The multiassembly problem reconstructing multiple transcript isoforms from EST fragment mixtures. Genome Res. 14, 426–441 11 Heard, E. et al. (1997) X-chromosome inactivation in mammals. Annu. Rev. Genet. 31, 571–610 12 Modrek, B. et al. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29, 2850–2859 13 Boguski, M.S. et al. (1993) dbEST–database for expressed sequence tags. Nat. Genet. 4, 332–333 14 Sorek, R. et al. (2002) Alu-containing exons are alternatively spliced. Genome Res. 12, 1060–1067 15 Lynch, M. (2002) Intron evolution as a population-genetic process. Proc. Natl. Acad. Sci. U. S. A. 99, 6118–6123 16 Lynch, M. and Kewalramani, A. (2003) Messenger RNA surveil- lance and the evolutionary proliferation of introns. Mol. Biol. Evol. 20, 563–571 0168-9525/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tig.2004.07.009 Conserved regulatory motifs in bacteria: riboswitches and beyond Cei Abreu-Goodger, Nancy Ontiveros-Palacios, Ricardo Ciria and Enrique Merino Departamento de Microbiologı´aMolecular, Instituto de Biotecnologı´a,Universidad Nacional Auto ´ noma de Me ´ xico, Cuernavaca, 62210 Morelos, Me ´ xico We present a computational approach that identifies regulatory elements conserved across phylogenetically distant organisms. Intergenic regulatory regions were clustered by orthology of the adjacent genes, and an iterative process was applied to search for significant motifs, enabling new elements of the putative regulon to be added in each cycle. With this approach, we identified highly conserved riboswitches and the Gram positive T-box. Interestingly, we identified many other regulatory systems that appear to depend on conserved RNA structures. Comparative genomic approaches are central to analyzing the increasing number of whole-genome sequences. Although using this kind of analysis to find regulatory elements is not new, the focus has usually been on one genome or group of closely related genomes [1–3] because sequence conservation of functional intergenic regions (promoters, protein binding sites) is usually low, and quickly diverges. It came as a surprise to many scientists when specific RNA ‘riboswitches’ were shown to be capable of regulating gene expression by directly sensing a metabolite without the intervention of a protein [4]. RNA riboswitches have since been shown to be involved in various metabolic processes including thiamine, riboflavin, cobalamine, adenine, guanine and lysine biosynthesis [5–11]. We assumed that this type of regulatory sequence would be easily identified given their broad phylogenetic distri- bution and highly conserved nature. Searching for interesting motifs The starting point for our work is a set of orthologous regulatory regions. To obtain these we used the Cluster of Orthologous Groups (COG) of proteins database (http://www.ncbi.nlm.nih.gov/COG/) [12] together with operon predictions based on intergenic distances [13]. In this manner, every protein from 164 fully sequenced bacterial genomes that was associated with a COG was assigned to the intergenic minimal upstream region (iMUR) of the first gene of the predicted operon to which it belongs. To avoid over-representation of similar sequences from related genomes, redundant sequences were elimi- nated. We obtained w4000 clusters of orthologous regulat- ory regions, each belonging to a different COG. We used the public domain motif discovery tool Multiple EM for Motif Elicitation (MEME) [14] to find a set of over-represented ‘seed motifs’ for each COG (Figure 1a). These motifs were used to identify other members of the putative regulon by searching in all upstream regions using the MEME counterpart Motif Alignment and Search Tool (MAST) [15]. As a result of this Corresponding author: Enrique Merino (merino@ibt.unam.mx). Available online 19 August 2004 Update TRENDS in Genetics Vol.20 No.10 October 2004 475 www.sciencedirect.com