Review Strategies to Annotate and Characterize Long Noncoding RNAs: Advantages and Pitfalls Huifen Cao, 1 Claes Wahlestedt, 2, * and Philipp Kapranov 1, * The past decade has seen an explosion of interest in long noncoding RNAs (lncRNAs). However, despite the massive volume of scientic data implicating these transcripts in a plethora of molecular and cellular processes, a great deal of controversy surrounds these RNAs. One of the main reasons for this lies in the multiple unique features of lncRNAs which limit the available methods used to characterize them. Combined with their vast numbers and inadequate clas- sication, comprehensive annotation of these transcripts becomes a daunting task. The solution to this complex challenge likely lies in deep understanding of the strengths and weaknesses of each computational and empirical approach, and integration of multiple strategies to reduce noise, authenticate the results, and classify lncRNAs. We review here both the advantages and caveats of strategies commonly used for functional characterization and annotation of lncRNAs in the context of emerging conceptual guidelines for their application. Background to lncRNAs Most if not all of the mammalian genome serves as a template for the production of a multitude of long, and ostensibly mostly non-protein coding, RNA species [1,2]. lncRNAs dened as tran- scripts longer than 200 nt with no apparent protein-coding capacity outweigh mRNAs in both sequence complexity and mass in human cells [3,4]. This fact, now widely accepted by the scientic community [5,6], together with multiple reports linking lncRNAs with a host of biological processes [7] and disease states [8,9], has led to intense interest in these transcripts. Conceptu- ally, massive expansion in the non-coding genome and transcriptome in complex organisms, in contrast to the relatively constant repertoire of protein-coding genes, makes non-coding RNA a natural candidate for the informational molecule underlying the increase in organismal complexity [10,11]. However, although some lncRNAs such as Xist, ANRIL, H19, and others do belong to a pantheon of human genes crucial for development and disease [7], no consensus on the functional signicance of most of lncRNAs has yet been reached; in fact, this matter remains a subject of vigorous debate [1215]. One of the primary reasons for this is that no empirically validated theoretical framework for predicting lncRNA function exists, in stark contrast to the clear and precise principles for inferring the function of protein-coding mRNAs based on sequence alone. While the mechanisms of action of some lncRNAs have been worked out in detail (reviewed in [7]), these examples are dwarfed by the staggering complexity of the lncRNA transcriptome, of which the function of most remains unknown. This makes any generalizations regarding function extremely tenuous and leaves a researcher no other option but to empirically characterize lncRNAs of interest using available methods of standard molecular biology and genetics in conjunction with emerging systems biology approaches. We review here the advantages and pitfalls of the available approaches in the context of conceptual guidelines for their application. Highlights Collectively, lncRNAs represent not only a very exciting and intriguing but also, methodologically, a very challen- ging group of transcripts to study. The challenges mainly stem from many unique features of these RNAs that create technical hurdles at every level of their annotation. Multiple in silico and wet-bench stra- tegies have been developed for every level of lncRNA annotation, starting with genomic architecture and basic annotation all the way to mechanistic and functional insights. However, each one comes with unique set of advan- tages but also limitations and caveats whose understanding is crucial for proper implementation and interpreta- tion of these methods. Integration of data from multiple approaches will very likely reduce biolo- gical and technological noise, authenti- cate the true mechanisms of lncRNA function, and classify these transcripts. 1 Institute of Genomics, School of Biomedical Sciences, Huaqiao University, 201 Pan-Chinese S & T Building, 668 Jimei Road, Xiamen 361021, China 2 Center for Therapeutic Innovation and Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1501 North West 10th Avenue, Miami, FL 33136, USA *Correspondence: cwahlestedt@med.miami.edu (C. Wahlestedt) and philippk08@hotmail.com (P. Kapranov). TIGS 1484 No. of Pages 18 Trends in Genetics, Month Year, Vol. xx, No. yy https://doi.org/10.1016/j.tig.2018.06.002 1 © 2018 Elsevier Ltd. All rights reserved.