Original article Motif prediction in ribosomal RNAs Lessons and prospects for automated motif prediction in homologous RNA molecules N.B. Leontis a, *, J. Stombaugh a , E. Westhof b a Chemistry Department and Center for Biomolecular Sciences, Overman Hall, Bowling Green State University, Bowling Green, OH 43403, USA b Institut de biologie moléculaire et cellulaire du CNRS, UPR 9002, modélisation et simulations des acides nucléiques, Université Louis-Pasteur, 15, rue René-Descartes, 67084 Strasbourg cedex, France Received 5 July 2002; accepted 9 July 2002 Abstract The traditional way to infer RNA secondary structure involves an iterative process of alignment and evaluation of covariation statistics between all positions possibly involved in basepairing. Watson–Crick basepairs typically show covariations that score well when examples of two or more possible basepairs occur. This is not necessarily the case for non-Watson–Crick basepairing geometries. For example, for sheared (trans Hoogsteen/Sugar edge) pairs, one base is highly conserved (always A or mostly A with some C or U), while the other can vary (G or A and sometimes C and U as well). RNA motifs consist of ordered, stacked arrays of non-Watson–Crick basepairs that in the secondary structure representation form hairpin or internal loops, multi-stem junctions, and even pseudoknots. Although RNA motifs occur recurrently and contribute in a modular fashion to RNA architecture, it is usually not apparent which bases interact and whether it is by edge-to-edge H-bonding or solely by stacking interactions. Using a modular sequence-analysis approach, recurrent motifs related to the sarcin–ricin loop of 23S RNA and to loop E from 5S RNA were predicted in universally conserved regions of the large ribosomal RNAs (16S- and 23S-like) before the publication of high-resolution, atomic-level structures of representative examples of 16S and 23S rRNA molecules in their native contexts. This provides the opportunity to evaluate the predictive power of motif-level sequence analysis, with the goal of automating the process for predicting RNA motifs in genomic sequences. The process of inferring structure from sequence by constructing accurate alignments is a circular one. The crucial link that allows a productive iteration of motif modeling and realignment is the comparison of the sequence variations for each putative pair with the corresponding isostericity matrix to determine which basepairs are consistent both with the sequence and the geometrical data. © 2002 Société française de biochimie et biologie moléculaire / Éditions scientifiques et médicales Elsevier SAS. All rights reserved Keywords: RNA motif; Non-Watson–Crick basepair; Sugar-edge; Hoogsteen edge 1. Introduction The new high-resolution structures of the ribosomal subunits confirm that the ribosomal RNA molecules (5S, 16S, and 23S) comprise a number of recurrent, modular motifs that mediate RNA–RNA, RNA–protein, and even RNA–drug interactions [1,2]. For our purposes, RNA motifs are ordered, stacked arrays of non-Watson–Crick basepairs that in the secondary structure representation form hairpin or internal loops, multi-stem junctions, and even pseudoknots. We predicted that certain motifs, such as the sarcin/ricin loop motif of 23S rRNA and the bacterial loop E motif of 5S rRNA, occur autonomously in a variety of other contexts within the ribosome [3,4]. The method that we employed to make these predictions entails the follow- ing steps: (1) The secondary structure is used to identify regions forming internal, hairpin, or junction loops. (2) A consensus profile is constructed for each strand in these loops and the consensus is checked against the consensus for the given motif—here the sarcin/ricin motif. (3) In promising regions, putative pairs are identified and the sequence variations in homologous sequences are compiled for those positions, paying attention to the possibility of misalignment of sequences in the database. (4) The se- Abbreviations: W.C., Watson–crick; S.E., Sugar-edge * Corresponding author. Tel.: +1-419-372-8663; fax: +1-419-372-9809. E-mail address: leontis@bgnet.bgsu.edu (N.B. Leontis). Biochimie 84 (2002) 961–973 © 2002 Société française de biochimie et biologie moléculaire / Éditions scientifiques et médicales Elsevier SAS. All rights reserved PII: S 0 3 0 0 - 9 0 8 4 ( 0 2 ) 0 1 4 6 3 - 3