1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
DOI: 10.1002/minf.201600141
ParSel: Parallel Selection of Micro-RNAs for Survival
Classification in Cancers
Debajyoti Sinha,
[a]
Debarka Sengupta,*
[b]
and Sanghamitra Bandyopadhyay*
[c]
Abstract: It is known that tumor micro-RNAs (miRNA) can
define patient survival and treatment response. We present
a framework to identify miRNAs which are predictive of
cancer survival. The framework attempts to rank the
miRNAs by exploring their collaborative role in gene
regulation. Our approach tests a significantly large number
of combinatorial cases leveraging parallel computation. We
carefully avoided parametric assumptions involved in evalu-
ations of miRNA expressions but used rigorous statistical
computation to assign an importance score to a miRNA.
Experimental results on three cancer types namely, KIRC, OV
and GBM verify that the top ranked miRNAs obtained using
the proposed framework produce better classification
accuracy as compared to some best practice variable
selection methods. Some of these top ranked miRNA are
also known to be associated with related diseases.
Keywords: miRNA prediction · classification · feature selection · cancer survival
1. Introduction
MicroRNAs (miRNAs) are short ( ~ 22nt) non-coding RNA
molecules present in nearly all multicellular organisms and
highly conserved in evolution. miRNAs have been found to
influence the output of many protein coding genes
[1, 2, 3]
They play important roles in most cellular processes
specially in development and regulation of cell cycle. There
have been strong indications that changes in the expression
of miRNA genes contribute to the human diseases including
cancers
[4, 5, 6, 7].
Studies in the last decade demonstrated that
miRNAs bind to mRNAs to trigger translational repression
and degradation and it is now believed that miRNA
primarily participate in translational control followed by
mRNA destabilization
[8]
.
Aberrant expression patterns of miRNAs have been
implicated in controlling the expression of their target
mRNAs to promote tumor growth, invasion, angiogenesis,
and immune evasion
[9, 10, 11]
. Further, studies suggest that
some miRNAs may function as oncogenes or tumor
suppressors
[12, 13]
. Tumor miRNA profiles can define relevant
subtypes, patient survival, and treatment response
[14, 15, 16]
.
MiRNAs can, as well, down-regulate different genes with
oncogenic activity to act as tumor suppressors
[17, 4]
thereby,
are potential therapeutic agents.
Conventional treatment of individual cancer patients
involves assessment of clinical infor- mation like age, tumor
size, origin of cell type, tumor stage, molecular subtypes
etc. These information guide the course of treatment and
prediction of patient survival. These pre- dictions improve
when the clinical features are supplemented by a large
number of genomic variables
[18, 19]
. Also considering the
bewildering complexity of cancer, it is likely that there are
several molecular factors that can explain the state of a
tumor better than the clinical parameters alone. However,
systematic analysis of multiple ’omics’ data sets (proteome,
transcriptome, genome, methylome) is largely obscured by
the sheer number of molecular descriptors. In order to
identify the most relevant set of variables, techniques of
statistical machine learning are frequently used
[18, 19]
.
Lu et al. had successfully reported that miRNA expres-
sion profiles can classify poorly differentiated tumors, even
when messenger RNA profiles failed when applied to the
same samples
[14]
. Studying miRNAs in the context of cancer
has provided innumerable insights into this disease and
introduced a new legion of diagnostics and therapeutics.
Many successful clinical trials have paved the way for
miRNAs to enter clinics as diagnostic and prognostic
[a] D. Sinha
Indian Statistical Institute, Kolkata 700108
[b] D. Sengupta
Indraprastha Institute of Information Technology, Delhi 110020.
E-mail: debarka@iiitd.ac.in
[c] S. Bandyopadhyay
Indian Statistical Institute, Kolkata 700108
E-mail: sanghami@isical.ac.in
Table 1. Dataset Summary
Dataset #miRNA #Patient
Samples
Cutoff Synapse ID
Kidney renal clear cell carci-
noma (KIRC)
1045 150 4 yrs syn1710291
Ovarian serous cystadeno
carcinoma (OV)
798 252 3 yrs syn1710359
Glioblastoma multiforme
(GBM)
533 155 1 yr syn1710368
Full Paper www.molinf.com
© 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2017, 36, 1600141 (1 of 10) 1600141