Abstract— We use a bioinformatics approach to search for the biological features that determine the cleavage site of the Microprocessor complex (or Drosha) within known miRNA hairpins. Towards this goal, we employ a previously developed methodology, termed DuplexSVM, which can accurately identify the four ends of a miRNA:miRNA* duplex. Here we use DuplexSVM to study how the Drosha determines its cleavage site. We perform in silico mutagenesis experiments on 142 hairpins by changing the distance of the Drosha site from the loop tip or the stem – single stranded tails junction by adding or removing matching nucleotides. Our results suggest that the Drosha cleavage site is mainly determined by its distance from the terminal loop tip. I. INTRODUCTION icroRNAs are small, ~22 nucleotides long, single- stranded non-coding RNAs that play an important regulatory role in both animals and plants by binding at target sites on messenger RNAs (mRNAs), leading to mRNA cleavage or translational repression [1]. The primary transcripts of microRNA genes are called primary miRNAs (pri-miRNA) and consist of a stem-loop (“hairpin”) structure extended with long single-stranded tails. The tails are detached (in animals) by the Microprocessor complex, whose core component is the RNase III enzyme Drosha, leaving a hairpin-shaped, ~60-70 nts long intermediate with a characteristic 3’ overhang of ~2 nt, the miRNA precursor (pre-miRNA). Two models have been proposed on how a pri- miRNA is processed to produce a pre-miRNA. According to the first model, Drosha or the holoenzyme with Drosha providing the catalytic activity, selects an RNA hairpin bearing a terminal loop that is no less than 10 nucleotides long, and cuts ~22 nucleotides from the terminal loop – stem Manuscript received November 30, 2013. This research has been co- financed by the EuropeanUnion (European Social Fund – ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund. Panayiota Poirazi is with the Institute of Molecular Biology and Biotechnology, FORTH, Greece 70013 (corresponding author, phone:00302810391139; e-mail: poirazi@imbb.forth.gr). Nestoras Karathanasis is with the Institute of Molecular Biology and Biotechnology, FORTH, Greece 70013 and with the Department of Biology, University of Crete, Heraklion, 71409, Greece (e-mail: nk3932@hotmail.com). Ioannis Tsamardinos is with the Department of Computer Science, University of Crete, Heraklion, 71409, Greece and with the Institute of Computer Science, FORTH, Heraklion, 70013, Greece (e-mail: tsamard@ics.forth.gr) junction to produce a pre-miRNA [2]. According to the second model, the cleavage site is determined mainly by the distance (~11 base pairs) from the stem – single stranded tails junction [3]. It was recently found that some pre- miRNAs (the so-called mirtrons) have a similar structure with regular pre-miRNAs, but enter the miRNA pathway without undergoing processing by Drosha, i.e. without undergoing the pri-miRNA stage [4]. Irrespectively of its production process, the pre-miRNA is then exported to the cytoplasm, where it is processed by another RNase III termed Dicer. Dicer cleaves the pre-miRNA at a certain distance (~22 nt) from the overhang created by the Microprocessor [5], leaving an RNA duplex with 3’ overhangs of ~2 nts called miRNA-miRNA* duplex. For each individual duplex, one (or both) of its strands ends up as the mature miRNA and is loaded into a RISC (RNA- induced Silencing Complex), where it performs its regulatory functions on target mRNA. The other strand, called miRNA*, is degraded. It may also be the case that both strands of the duplex correspond to a mature miRNA: only one strand becomes the miRNA each time but with similar frequency [1]. Given the importance of miRNAs in gene regulation, several computational approaches have been developed to complement experimental ones. Most of them focus on the discovery of novel miRNA genes or possible mRNA targets of known miRNAs [3], [4]. As part of miRNA gene discovery, these tools predict certain features of miRNAs such as the starting position of the mature miRNA [5-7], the Drosha cleavage site [8] (which coincides with the start of the mature miRNA on a pri-miRNA) or the mature miRNA molecule on hairpin precursors [9], [10], [11]. In a previous study, we introduced the problem of identifying the miRNA:miRNA* duplex as a first step in identifying the mature miRNA. We adopted this approach because (a) the duplex is a necessary stage of miRNA biogenesis and (b) given the duplex, it is relatively easy to experimentally determine whether both, or which of the two duplex strands results in the mature miRNA(s). We showed that our tool significantly outperformed the state of the art tool MaturePred, as well as a Trivial locator, when assessed on a common blind test set[6]. Here, we use DuplexSVM to investigate the effects of mutagenesis on Drosha processing, leading to several experimentally testable predictions. In silico mutagenesis experiments performed on 142 hairpins suggest that both the A bioinformatics approach for investigating the determinants of Drosha processing Nestoras Karathanasis, Ioannis Tsamardinos and Panayiota Poirazi M