1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 DOI: 10.1002/minf.201600141 ParSel: Parallel Selection of Micro-RNAs for Survival Classification in Cancers Debajyoti Sinha, [a] Debarka Sengupta,* [b] and Sanghamitra Bandyopadhyay* [c] Abstract: It is known that tumor micro-RNAs (miRNA) can define patient survival and treatment response. We present a framework to identify miRNAs which are predictive of cancer survival. The framework attempts to rank the miRNAs by exploring their collaborative role in gene regulation. Our approach tests a significantly large number of combinatorial cases leveraging parallel computation. We carefully avoided parametric assumptions involved in evalu- ations of miRNA expressions but used rigorous statistical computation to assign an importance score to a miRNA. Experimental results on three cancer types namely, KIRC, OV and GBM verify that the top ranked miRNAs obtained using the proposed framework produce better classification accuracy as compared to some best practice variable selection methods. Some of these top ranked miRNA are also known to be associated with related diseases. Keywords: miRNA prediction · classification · feature selection · cancer survival 1. Introduction MicroRNAs (miRNAs) are short ( ~ 22nt) non-coding RNA molecules present in nearly all multicellular organisms and highly conserved in evolution. miRNAs have been found to influence the output of many protein coding genes [1, 2, 3] They play important roles in most cellular processes specially in development and regulation of cell cycle. There have been strong indications that changes in the expression of miRNA genes contribute to the human diseases including cancers [4, 5, 6, 7]. Studies in the last decade demonstrated that miRNAs bind to mRNAs to trigger translational repression and degradation and it is now believed that miRNA primarily participate in translational control followed by mRNA destabilization [8] . Aberrant expression patterns of miRNAs have been implicated in controlling the expression of their target mRNAs to promote tumor growth, invasion, angiogenesis, and immune evasion [9, 10, 11] . Further, studies suggest that some miRNAs may function as oncogenes or tumor suppressors [12, 13] . Tumor miRNA profiles can define relevant subtypes, patient survival, and treatment response [14, 15, 16] . MiRNAs can, as well, down-regulate different genes with oncogenic activity to act as tumor suppressors [17, 4] thereby, are potential therapeutic agents. Conventional treatment of individual cancer patients involves assessment of clinical infor- mation like age, tumor size, origin of cell type, tumor stage, molecular subtypes etc. These information guide the course of treatment and prediction of patient survival. These pre- dictions improve when the clinical features are supplemented by a large number of genomic variables [18, 19] . Also considering the bewildering complexity of cancer, it is likely that there are several molecular factors that can explain the state of a tumor better than the clinical parameters alone. However, systematic analysis of multiple ’omics’ data sets (proteome, transcriptome, genome, methylome) is largely obscured by the sheer number of molecular descriptors. In order to identify the most relevant set of variables, techniques of statistical machine learning are frequently used [18, 19] . Lu et al. had successfully reported that miRNA expres- sion profiles can classify poorly differentiated tumors, even when messenger RNA profiles failed when applied to the same samples [14] . Studying miRNAs in the context of cancer has provided innumerable insights into this disease and introduced a new legion of diagnostics and therapeutics. Many successful clinical trials have paved the way for miRNAs to enter clinics as diagnostic and prognostic [a] D. Sinha Indian Statistical Institute, Kolkata 700108 [b] D. Sengupta Indraprastha Institute of Information Technology, Delhi 110020. E-mail: debarka@iiitd.ac.in [c] S. Bandyopadhyay Indian Statistical Institute, Kolkata 700108 E-mail: sanghami@isical.ac.in Table 1. Dataset Summary Dataset #miRNA #Patient Samples Cutoff Synapse ID Kidney renal clear cell carci- noma (KIRC) 1045 150 4 yrs syn1710291 Ovarian serous cystadeno carcinoma (OV) 798 252 3 yrs syn1710359 Glioblastoma multiforme (GBM) 533 155 1 yr syn1710368 Full Paper www.molinf.com © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim Mol. Inf. 2017, 36, 1600141 (1 of 10) 1600141