genes G C A T T A C G G C A T Article WebCircRNA: Classifying the Circular RNA Potential of Coding and Noncoding RNA Xiaoyong Pan 1,2,3 , Kai Xiong 2,4 , Christian Anthon 1,2,4 , Poul Hyttel 2,4 , Kristine K. Freude 2,4 , Lars Juhl Jensen 1,3, * and Jan Gorodkin 1,2,4, * 1 Center for Non-Coding RNA in Technology and Health, University of Copenhagen, 1870 Frederiksberg C, Denmark; xypan172436@gmail.com (X.P.); anthon@rth.dk (C.A.) 2 Department of Veterinary and Animal Sciences, University of Copenhagen, 1870 Frederiksberg C, Denmark; hpw927@alumni.ku.dk (K.X.); poh@sund.ku.dk (P.H.); kkf@sund.ku.dk (K.K.F.) 3 Department of Disease Systems Biology, Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, 2200 Copenhagen, Denmark 4 BrainStem—Stem Cell Center of Excellence in Neurology, University of Copenhagen, 1870 Frederiksberg C, Denmark * Correspondence: lars.juhl.jensen@cpr.ku.dk (L.J.J.); gorodkin@rth.dk (J.G.) Received: 7 September 2018; Accepted: 2 November 2018; Published: 6 November 2018   Abstract: Circular RNAs (circRNAs) are increasingly recognized to play crucial roles in post-transcriptional gene regulation including functioning as microRNA (miRNA) sponges or as wide-spread regulators, for example in stem cell differentiation. It is therefore highly relevant to identify if a transcript of interest can also function as a circRNA. Here, we present a user-friendly web server that predicts if coding and noncoding RNAs have circRNA isoforms and whether circRNAs are expressed in stem cells. The predictions are made by random forest models using sequence-derived features as input. The output scores are converted to fractiles, which are used to assess the circRNA and stem cell potential. The performances of the three models are reported as the area under the receiver operating characteristic (ROC) curve and are 0.82 for coding genes, 0.89 for long noncoding RNAs (lncRNAs) and 0.72 for stem cell expression. We present WebCircRNA for quick evaluation of human genes and transcripts for their circRNA potential, which can be essential in several contexts. Keywords: Circular RNA; random forest; noncoding RNA 1. Introduction Circular RNAs (circRNAs) were recently discovered to be widespread, abundant, expressed across species, and implicated in several diseases. They are created by non-linear backsplicing between a splice donor and an upstream splice acceptor, and evidence is emerging for them playing functional roles as microRNA (miRNA) sponges [1,2] and in regulation of gene splicing and transcription [3]. Recently, the miR-7 sponge CDR1as has been found to be involved in stem cell regulation of periodontal ligament [4]. Other studies suggest that circRNAs can encode proteins [5], and 90% of the 92,375 human circRNAs in the circBase database (v0.1) [6] arise from protein-coding genes (PCGs). The number of discovered circRNAs has been rapidly increasing in recent years due to the development of new high-throughput sequencing technologies, and circBase now contains more than 90,000 circRNA transcripts [6]. In addition, circRNAs are expressed in a cell/tissue-specific manner [2]; for example, 16,017 are expressed in stem cells, and they are especially prominent during embryonic development [7]. Current computational pipelines are focused on identifying presence of backsplicing junction-spanning reads from RNA-seq data [8]. Commonly, pipelines to identify circRNAs map Genes 2018, 9, 536; doi:10.3390/genes9110536 www.mdpi.com/journal/genes