M. Winslett (Ed.): SSDBM 2009, LNCS 5566, pp. 200–216, 2009.
© Springer-Verlag Berlin Heidelberg 2009
Covariant Evolutionary Event Analysis for Base
Interaction Prediction Using a Relational Database
Management System for RNA
Weijia Xu
1
, Stuart Ozer
2
, and Robin R. Gutell
3
1
Texas Advanced Computing Center,
The University of Texas at Austin, Austin, Texas, USA
xwj@tacc.utexas.edu
2
One Microsoft Way Redmond, WA., Seattle, Washington, USA
stuarto@microsoft.com
3
Center of Computational Biology and Bioinformatics
The University of Texas at Austin, Austin, Texas, USA
Robin.gutell@icmb.utexas.edu
Abstract. With an increasingly large amount of sequences properly aligned,
comparative sequence analysis can accurately identify not only common struc-
tures formed by standard base pairing but also new types of structural elements
and constraints. However, traditional methods are too computationally expen-
sive to perform well on large scale alignment and less effective with the
sequences from diversified phylogenetic classifications. We propose a new ap-
proach that utilizes coevolutional rates among pairs of nucleotide positions
using phylogenetic and evolutionary relationships of the organisms of aligned
sequences. With a novel data schema to manage relevant information within a
relational database, our method, implemented with a Microsoft SQL Server
2005, showed 90% sensitivity in identifying base pair interactions among 16S
ribosomal RNA sequences from Bacteria, at a scale 40 times bigger and 50%
better sensitivity than a previous study. The results also indicated covariation
signals for a few sets of cross-strand base stacking pairs in secondary structure
helices, and other subtle constraints in the RNA structure.
Keywords: Biological database, Bioinformatics, Sequence Analysis, RNA.
1 Introduction
Comparative sequence analysis has been successfully utilized to identify RNA
structures that are common to different families of properly aligned RNA sequences.
Here we present enhance the capabilities of relational database management for com-
parative sequence analysis through extended data schema and integrative analysis
routines. The novel data schema establishes the foundation that analyzes multiple
dimensions of RNA sequence, sequence alignment, different aspects of 2D and 3D
RNA structure information and phylogenetic/evolution information. The integrative
analysis routines are unique, scale to large volumes of data, and provide better
accuracy and performance. With these database enhancements, we details the