ARNA: Interactive Comparison and Alignment of RNA Secondary Structure Gerald Gainant * LaBRI UMR 5800 University of Bordeaux 1 FRANCE David Auber † LaBRI UMR 5800 University of Bordeaux 1 FRANCE Figure 1: Comparing the secondary structure of two RNA sequences with ARNA. ABSTRACT ARNA is an interactive visualization system that supports compar- ison and alignment of RNA secondary structure. We present a new approach to RNA alignment that exploits the complex structure of the Smith-Waterman local distance matrix, allowing people to ex- plore the space of possible partial alignments to discover a good global solution. The modular software architecture separates the user interface from computation, allowing the possibility of incor- porating different alignment algorithms into the same framework. CR Categories: D.2.2 [Software Engineering]: Tools and Techniques—User interfaces; G.2.1 [Discrete Mathematics]: Combinatorics—Combinatorial algorithm; J.3 [Life and Medical Sciences]: Biology and genetics—; Keywords: visualization, combinatorics, bioinformatics, graph drawing, sequence alignment, RNA 1 I NTRODUCTION ARNA is a new open-source system that provides support for bi- ologists and bioinformaticians who need to compare the RNA sec- ondary structures in two different organisms [3, 5]. The structure of RNA is often studied at three levels: the primary structure com- prises the linear string of amino acids; the secondary structure is created when some amino acids in the sequence bond to others, forming two-dimensional structure; finally, the tertiary structure is formed when the sequence folds into a shape in three-dimensional space. ARNA always shows the primary RNA structure, and has * e-mail: gainant@labri.fr † e-mail: auber@labri.fr side by side windows for showing the seconary or tertiary struc- tures. ARNA is built within the open source framework Tulip [2] 1 . Previous systems on secondary structure visualization, such as RnaViz [8] and Vienna [6], suffer from a lack of stability: small changes in the RNA primary structure may drastically change their drawing of its secondary structure. The scalable and stable tree- based drawing algorithm used in ARNA that uses a heuristic to locate and anchor quasi-isomorphic subgraphs shared between the two sequences is discussed in previous work [3]. We focus here on the problem of aligning RNA. 2 ALIGNING RNA The problem of multiple sequence alignment has been well stud- ied [4]. The RNA primary structure is a sequence R of nucleotides, represented as a word of length n on the alphabet {A, C, G, U } : R = r 1 r 2 ... r n . Let W R be the set of all the subwords of R. Let I n be the set of all the sub-intervals on [1, n] : I n = (i, j)∈[1,n] 2 i≤j [i, j] Then I n × I m is the set of all possible matches between the two se- quences of length n and m respectively. We would like find the similarity set, namely the subset that contains good matches. Matrix Interpretation A prior approach by Smith and Wa- terman to finding similarities between two RNA sequences used a well-known local distance matrix built using the Levenshein dis- tance metric [9]. They used this matrix to compute the longest pos- sible match between the two sequences with the following heuristic: compute a score function ς : W R 0 × W R 1 −→ R, find the maximum score, and then backtrack along the path that reached this score by reversing the computation. However, we noticed after imple- menting this algorithm in ARNA that this single longest possible match is far from the best match. We can instead find several good 1 http://www.tulip-software.org