Journal of Proteomics & Bioinformatics - Open Access
www.omicsonline.com Research Article JPB/Vol.2/August 2009
J Proteomics Bioinform Volume 2(8) : 336-343(2009) - 336
ISSN:0974-276X JPB, an open access journal
Swati Sinha
1,*
, T.S. Vasulu
1
, and Rajat K. De
2,*
1
Biological Anthropology Unit, Indian Statistical Institute, Kolkata, India
2
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
*Corresponding authors: Swati Sinha, Biological Anthropology Unit, Indian Statistical Institute,
Kolkata-108, India, Tel: 91-33-25753215; E-mail: swati.6783@gmail.com
Rajat K De, Machine Intelligence Unit, Indian Statistical Institute, Kolkata-108, India
Tel: 91-33-25753105, Fax (O): +91-33-25753026, E-mail: rajat@isical.ac.in
Received July 02, 2009; Accepted August 11, 2009; Published August 12, 2009
Citation: Sinha S, Vasulu TS, De RK (2009) Performance and Evaluation of MicroRNA Gene Identification Tools.
J Proteomics Bioinform 2: 336-343. doi:10.4172/jpb.1000093
Copyright: © 2009 Sinha S, et al. This is an open-access article distributed under the terms of the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original
author and source are credited.
Abstract
MicroRNAs are small single stranded RNA molecules of ~ 22 nt in length which play important role in post
transcriptional gene regulation either by translational repression of mRNA or by their cleavage. Since their discovery,
continuous efforts to identify the miRNA genes led to the discovery of several miRNAs in plants as well as animals.
Owing to the limitations of the molecular genetic techniques of miRNA identification, computational approaches
were introduced for better and affordable in silico-miRNA predictions. Here, we compared a few miRNA gene
identification tools, such as ‘MiPred’,‘Triplet-SVM’,‘BayesMiRNAfind’,‘OneClassmiRNAfind’and
‘BayesSVMmiRNAfind’ to evaluate the performance of its predictability based on the real and pseudo precursor
miRNA datasets. Of all the tools examined MiPred is more sensitive (96%) in identifying pseudo miRNAs than
Triplet-SVM for real/pseudo miRNA classification, whereas for mature miRNA prediction ‘one-class’ SVM classifier
shows best specificity (96%), while BayesSVMmiRNAfind shows least specificity (8%).
Keywords: MiPred; Triplet-SVM; BayesMiRNAfind; OneClassmiRNAfind; BayesSVMmiRNAfind; Sensitivity; Speci-
ficity; Accuracy; Mathew’s Correlation Coefficient; Positive Predictive Value
Abbreviations: miRNA: MicroRNA; pre-miRNA: Precursor MicroRNA; HMM: Hidden Markov Model; SVM: Sup-
port Vector Machine; PCA: Principal Component Analysis; K-NN: K-Nearest Neighbor; MCC: Mathew’s Correlation
Coefficient; PPV: Positive Predictive Value
Introduction
Interest in miRNAs and their role as gene expression
regulators has been growing immensely (Clop et al., 2006,
Feng et al., 2009). The first effort that could identify such
a small regulator, the lin-4 RNA in C. elegans, was done
by Victor Ambros and colleagues, Rosalind Lee and
Rhonda Feinbaum (Bartel DP, 2004). It was shown that
the 21 nt lin-4 RNA, represses mRNA and controls part
of the C. elegans larval development. The next small regu-
latory RNA to be discovered was the let-7, which con-
trols another later developmental stage of C. elegans (Lee,
et al., 1993; Wightman, et al., 1993). They were previ-
ously known as small temporal RNAs (stRNAs), but to-
day recognized as the first of the large class of small regu-
latory non-coding RNA molecules, ‘microRNAs’. Now it
is believed that this class of molecules is not only limited
to development but also plays a very important role in the
regulation of a wide range of biological processes (Gard
et al., 2006, Feng et al., 2009).
MicroRNAs are small non-coding RNAs of approxi-
mately 22nt (ranged 19-25nt) known to be involved in