Original article
Motif prediction in ribosomal RNAs
Lessons and prospects for automated motif prediction in homologous
RNA molecules
N.B. Leontis
a,
*, J. Stombaugh
a
, E. Westhof
b
a
Chemistry Department and Center for Biomolecular Sciences, Overman Hall, Bowling Green State University, Bowling Green, OH 43403, USA
b
Institut de biologie moléculaire et cellulaire du CNRS, UPR 9002, modélisation et simulations des acides nucléiques,
Université Louis-Pasteur, 15, rue René-Descartes, 67084 Strasbourg cedex, France
Received 5 July 2002; accepted 9 July 2002
Abstract
The traditional way to infer RNA secondary structure involves an iterative process of alignment and evaluation of covariation statistics
between all positions possibly involved in basepairing. Watson–Crick basepairs typically show covariations that score well when examples of
two or more possible basepairs occur. This is not necessarily the case for non-Watson–Crick basepairing geometries. For example, for sheared
(trans Hoogsteen/Sugar edge) pairs, one base is highly conserved (always A or mostly A with some C or U), while the other can vary (G or
A and sometimes C and U as well). RNA motifs consist of ordered, stacked arrays of non-Watson–Crick basepairs that in the secondary
structure representation form hairpin or internal loops, multi-stem junctions, and even pseudoknots. Although RNA motifs occur recurrently
and contribute in a modular fashion to RNA architecture, it is usually not apparent which bases interact and whether it is by edge-to-edge
H-bonding or solely by stacking interactions. Using a modular sequence-analysis approach, recurrent motifs related to the sarcin–ricin loop
of 23S RNA and to loop E from 5S RNA were predicted in universally conserved regions of the large ribosomal RNAs (16S- and 23S-like)
before the publication of high-resolution, atomic-level structures of representative examples of 16S and 23S rRNA molecules in their native
contexts. This provides the opportunity to evaluate the predictive power of motif-level sequence analysis, with the goal of automating the
process for predicting RNA motifs in genomic sequences. The process of inferring structure from sequence by constructing accurate
alignments is a circular one. The crucial link that allows a productive iteration of motif modeling and realignment is the comparison of the
sequence variations for each putative pair with the corresponding isostericity matrix to determine which basepairs are consistent both with the
sequence and the geometrical data. © 2002 Société française de biochimie et biologie moléculaire / Éditions scientifiques et médicales
Elsevier SAS. All rights reserved
Keywords: RNA motif; Non-Watson–Crick basepair; Sugar-edge; Hoogsteen edge
1. Introduction
The new high-resolution structures of the ribosomal
subunits confirm that the ribosomal RNA molecules (5S,
16S, and 23S) comprise a number of recurrent, modular
motifs that mediate RNA–RNA, RNA–protein, and even
RNA–drug interactions [1,2]. For our purposes, RNA motifs
are ordered, stacked arrays of non-Watson–Crick basepairs
that in the secondary structure representation form hairpin
or internal loops, multi-stem junctions, and even
pseudoknots. We predicted that certain motifs, such as the
sarcin/ricin loop motif of 23S rRNA and the bacterial loop
E motif of 5S rRNA, occur autonomously in a variety of
other contexts within the ribosome [3,4]. The method that
we employed to make these predictions entails the follow-
ing steps: (1) The secondary structure is used to identify
regions forming internal, hairpin, or junction loops. (2) A
consensus profile is constructed for each strand in these
loops and the consensus is checked against the consensus
for the given motif—here the sarcin/ricin motif. (3) In
promising regions, putative pairs are identified and the
sequence variations in homologous sequences are compiled
for those positions, paying attention to the possibility of
misalignment of sequences in the database. (4) The se-
Abbreviations: W.C., Watson–crick; S.E., Sugar-edge
* Corresponding author. Tel.: +1-419-372-8663; fax: +1-419-372-9809.
E-mail address: leontis@bgnet.bgsu.edu (N.B. Leontis).
Biochimie 84 (2002) 961–973
© 2002 Société française de biochimie et biologie moléculaire / Éditions scientifiques et médicales Elsevier SAS. All rights reserved
PII: S 0 3 0 0 - 9 0 8 4 ( 0 2 ) 0 1 4 6 3 - 3