Review
Strategies to Annotate and Characterize
Long Noncoding RNAs: Advantages and
Pitfalls
Huifen Cao,
1
Claes Wahlestedt,
2,
* and Philipp Kapranov
1,
*
The past decade has seen an explosion of interest in long noncoding RNAs
(lncRNAs). However, despite the massive volume of scientific data implicating
these transcripts in a plethora of molecular and cellular processes, a great deal
of controversy surrounds these RNAs. One of the main reasons for this lies in
the multiple unique features of lncRNAs which limit the available methods used
to characterize them. Combined with their vast numbers and inadequate clas-
sification, comprehensive annotation of these transcripts becomes a daunting
task. The solution to this complex challenge likely lies in deep understanding of
the strengths and weaknesses of each computational and empirical approach,
and integration of multiple strategies to reduce noise, authenticate the results,
and classify lncRNAs. We review here both the advantages and caveats of
strategies commonly used for functional characterization and annotation of
lncRNAs in the context of emerging conceptual guidelines for their application.
Background to lncRNAs
Most if not all of the mammalian genome serves as a template for the production of a multitude of
long, and ostensibly mostly non-protein coding, RNA species [1,2]. lncRNAs – defined as tran-
scripts longer than 200 nt with no apparent protein-coding capacity – outweigh mRNAs in both
sequence complexity and mass in human cells [3,4]. This fact, now widely accepted by the
scientific community [5,6], together with multiple reports linking lncRNAs with a host of biological
processes [7] and disease states [8,9], has led to intense interest in these transcripts. Conceptu-
ally, massive expansion in the non-coding genome and transcriptome in complex organisms, in
contrast to the relatively constant repertoire of protein-coding genes, makes non-coding RNA a
natural candidate for the informational molecule underlying the increase in organismal complexity
[10,11]. However, although some lncRNAs such as Xist, ANRIL, H19, and others do belong to a
pantheon of human genes crucial for development and disease [7], no consensus on the functional
significance of most of lncRNAs has yet been reached; in fact, this matter remains a subject of
vigorous debate [12–15]. One of the primary reasons for this is that no empirically validated
theoretical framework for predicting lncRNA function exists, in stark contrast to the clear and
precise principles for inferring the function of protein-coding mRNAs based on sequence alone.
While the mechanisms of action of some lncRNAs have been worked out in detail (reviewed in [7]),
these examples are dwarfed by the staggering complexity of the lncRNA transcriptome, of which
the function of most remains unknown. This makes any generalizations regarding function
extremely tenuous and leaves a researcher no other option but to empirically characterize lncRNAs
of interest using available methods of standard molecular biology and genetics in conjunction with
emerging systems biology approaches. We review here the advantages and pitfalls of the available
approaches in the context of conceptual guidelines for their application.
Highlights
Collectively, lncRNAs represent not
only a very exciting and intriguing but
also, methodologically, a very challen-
ging group of transcripts to study. The
challenges mainly stem from many
unique features of these RNAs that
create technical hurdles at every level
of their annotation.
Multiple in silico and wet-bench stra-
tegies have been developed for every
level of lncRNA annotation, starting
with genomic architecture and basic
annotation all the way to mechanistic
and functional insights. However, each
one comes with unique set of advan-
tages but also limitations and caveats
whose understanding is crucial for
proper implementation and interpreta-
tion of these methods.
Integration of data from multiple
approaches will very likely reduce biolo-
gical and technological noise, authenti-
cate the true mechanisms of lncRNA
function, and classify these transcripts.
1
Institute of Genomics, School of
Biomedical Sciences, Huaqiao
University, 201 Pan-Chinese S & T
Building, 668 Jimei Road, Xiamen
361021, China
2
Center for Therapeutic Innovation
and Department of Psychiatry and
Behavioral Sciences, University of
Miami Miller School of Medicine, 1501
North West 10th Avenue, Miami, FL
33136, USA
*Correspondence:
cwahlestedt@med.miami.edu
(C. Wahlestedt) and
philippk08@hotmail.com (P. Kapranov).
TIGS 1484 No. of Pages 18
Trends in Genetics, Month Year, Vol. xx, No. yy https://doi.org/10.1016/j.tig.2018.06.002 1
© 2018 Elsevier Ltd. All rights reserved.