miRSeqNovel: An R based workflow for analyzing miRNA sequencing data Kui Qian a, * , Eeva Auvinen b, c , Dario Greco d , Petri Auvinen a a DNA Sequencing and Genomics Laboratory, Institute of Biotechnology, University of Helsinki, Helsinki, Finland b Haartman Institute, Department of Virology, Helsinki, Finland c Helsinki University Hospital Laboratory, Department of Virology and Immunology, Helsinki, Finland d Department of Bioscience and Nutrition, Karolinska Institutet, Sweden article info Article history: Received 28 February 2012 Received in revised form 4 May 2012 Accepted 4 May 2012 Available online 17 May 2012 Keywords: miRNA sequencing Differentially expressed miRNAs Novel miRNA prediction abstract We present miRSeqNovel, an R based workflow for miRNA sequencing data analysis. miRSeqNovel can process both colorspace (SOLiD) and basespace (Illumina/Solexa) data by different mapping algorithms. It finds differentially expressed miRNAs and gives conservative prediction of novel miRNA candidates with customized parameters. miRSeqNovel is freely available at http://sourceforge.net/projects/mirseq/files. Ó 2012 Elsevier Ltd. All rights reserved. With the advantages of next generation sequencing (NGS) platforms, new opportunities have arisen to quantitate the expression of known miRNAs as well as to predict novel miRNAs. A number of tools have been developed for the analysis of miRNA sequencing data, such as stand-alone software miRDeep2 [1], or web based tools like miRanalyzer [2] and UEA sRNA toolkit [3] (the latter two also provide stand-alone versions). However, there are limitations in these methods. First, some of them are designed to support basespace format only, such as miRDeep2. Second, methods such as miRDeep2 and miRanalyzer have their mandatory mapping method (i.e. Bowtie [4]) and they do not allow utilization of custom mapping methods or mapping parameters. Third, web versions of miRanalyzer and UEA sRNA toolkit only support a limited number of reference genomes, and have limitations on the number of input reads. Fourth, miRanalyzer and UEA sRNA toolkit depend on their inherent outdated miRNA annotations for novel miRNA prediction. miRDeep2 requires miRNA annotations of related species for novel miRNA discovery. Fifth, those methods have limited options for adjusting prediction parameters, which are needed to obtain reliable miRNA predictions because of varying lengths of pre-miRNA sequences and different complementarities between mature and star miRNAs in different species. An example is the length of gaps between star and mature sequences: long (w400 nt) in plants and short (w40 nt) in mammals. This option, for instance, is lacking in miRDeep2. Moreover, those methods do not have the flexibility of choosing statistical methods to find differentially expressed miRNAs, but they reply on a limited number of pre-defined methods. Based on the above facts, we find the existing methods not flexible enough for processing miRNA sequencing data. We have developed a new workflow with improved flexibility and prediction accuracy for miRNA sequencing data processing, which we have named miRSeqNovel. miRSeqNovel can use output from popular mapping software, e.g. RNA2MAP [5], Bowtie [4] or BWA [6], which can map sequencing data from multiple platforms, including SOLiD (“csfasta” format) and Illumina/Solexa (“fastq” format) platform, to any reference genomes. The genome mapping output is combined with the known miRNA information from miRBase [7] (Fig. 1 . A) to produce a table of read counts for known miRNAs. miRSeqNovel applies widely-used statistical methods, such as functions implemented in popular Bioconductor packages edgeR [8] and DESeq [9], or other user-specified methods, to discover differentially expressed miRNAs (Fig. 1 . B). Next, reads mapped to known non-coding RNAs and exon regions are filtered according to Ensembl annotation [10] (optional step). Finally, the remaining reads will be used as an input for novel miRNA predic- tion (Fig. 1 . C). miRSeqNovel uses mapped reads information to find candidate miRNA precursor sequences by screening their secondary structures. By assigning different sets of predicting parameters optimized for animal and plant genomes, we demon- strated that miRSeqNovel can successfully predict most known miRNAs and find conservative novel candidates. * Corresponding author. E-mail address: kui.qian@helsinki.fi (K. Qian). Contents lists available at SciVerse ScienceDirect Molecular and Cellular Probes journal homepage: www.elsevier.com/locate/ymcpr 0890-8508/$ e see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.mcp.2012.05.002 Molecular and Cellular Probes 26 (2012) 208e211