BIOINFORMATICS Vol. 00 no. 00 2010 Pages 1–2 PLEXY: Efficient Target Prediction for Box C/D snoRNAs Stephanie Kehr 1 * , Sebastian Bartschat 1 , Peter F. Stadler 1-5 , Hakim Tafer 1 1 Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, H ¨ artelstrasse 16-18, D-04107 Leipzig, Germany 2 Inst. f. Theoretical Chemistry, University of Vienna, W¨ ahringerstrasse 17, A-1090 Vienna, Austria 3 Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, D-04103 Leipzig, Germany 4 RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1,D-04103 Leipzig, Germany 5 The Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, New Mexico, USA Received on XXXXX; revised on XXXXX; accepted on XXXXX Associate Editor: XXXXXXX ABSTRACT Motivation: Small nucleolar RNAs (snoRNAs) are an abundant class of non-coding RNAs with a wide variety of cellular functions including chemical modification of RNA, telomere maintanance, pre- rRNA processing, and regulatory activities in alternative splicing. The main role of box C/D snoRNAs is to determine the targets for 2’- O-ribose methylation, which is important for rRNA maturation and splicing regulation of some mRNAs. The targets are still unknown, however, for many “orphan” snoRNAs. While a fast and efficient target predictor for box H/ACA-RNA target is available, no comparable tool exists for C/D-Box snoRNAs, even though they bind to their targets in a much less complex manner. Results: PLEXY is a dynamic programming algorithm that computes thermodynamically optimal interactions of a box C/D snoRNA with a putative target RNA. Implemented as scanner for large input sequences and equipped with filters on the duplex structure, PLEXY is an efficient and reliable tool for the predictions of box C/D snoRNA target sites. Availability: The source code of PLEXY is freely available at http: //www.bioinf.uni-leipzig.de/Software/PLEXY Contact: steffi@bioinf.uni-leipzig.de 1 INTRODUCTION Box C/D snoRNAs are mainly involved in 2’-O-ribose methylation of specific nucleotides in ribosomal and spliceosomal RNAs (Terns & Terns, 2002). The targeted position is located exactly 5 nucleotides upstream of the 5 ′ end of the D or D’ box. It is determined by sequence-specific hybridization, Fig. 1A. The base- pairing region has a length of 7-20 nts and exhibits a simple structure consisting of stacked base-pairs and a few mismatches only. In particular, bulges are absent (Ni et al., 1997). Recently, an efficient and reliable tool for predicting the much more complex interactions of H/ACA snoRNAs with their targets has become available (Tafer et al., 2010), which is based on the thermodynamics principles of RNA folding. No comparable approach is currently available for the simple C/D snoRNA-RNA ∗ to whom correspondence should be addressed duplexes. snoTarget (Bazeley et al., 2008), at present the only computer program devoted to C/D snoRNA target prediction, employs pattern matching to find candidates, which are then ranked by the co-folding energy of snoRNA and target as computed by RNAcofold (Bernhart et al., 2006). In contrast, plexy directly computes the interaction energies by means of dynamic programming. 2 RESULTS The PLEXY Algorithm PLEXY takes a snoRNA sequence with annotated box-motifs and a list of potential target RNAs as input. First 20 nt sequence segments upstream of D- and D’- boxes ers extraceted as putative interaction regions. PLEXY then calls the RNAplex algorithm to compute stable duplexes of the snoRNA antisense region and the putative targets. RNAplex is a fast folding algorithm for unbranched RNA structures that utilizes a linearized energy model to achieve a linear runtime behavior (Tafer & Hofacker, 2008). The list of duplexes is then filtered using the rules compiled by (Chen et al., 2007): (a) the interaction should be at least 7nts long, (b) no bulges are allowed, (c) the core duplex region contains at most one mismatch, (d) the methylated residue forms a Watson-Crick pair. Finally, the putative target sites are ranked by the computed duplex energies. Runtime The CPU requirements of PLEXY scale linearly with the length of the target sequence. It scans 10 6 nucleotides of target sequences in 19s on a 2.66GHz Intel processor (Q9400). This is only four times slower than the pattern search algorithm employed by snoTarget. Accuracy In order to compare the performance of PLEXY and snoTarget we used a collection of experimentally verified snoRNA-rRNA interactions of yeast (Lowe & Eddy, 1999) and human (Lestrade & Weber, 2006), and used yeast (Samarsky & Fournier, 1999) and human (Lestrade & Weber, 2006) snoRNA and rRNA sequences. In the yeast dataset, PLEXY correctly predicted all 50 target sites, 49 (98%) being ranked first. In contrast, c  Oxford University Press 2010. 1