Open Journal of Statistics, 2016, 6, 628-636
Published Online August 2016 in SciRes. http://www.scirp.org/journal/ojs
http://dx.doi.org/10.4236/ojs.2016.64053
How to cite this paper: López-Kleine, L. and González-Prieto, C. (2016) Challenges Analyzing RNA-Seq Gene Expression Data.
Open Journal of Statistics, 6, 628-636. http://dx.doi.org/10.4236/ojs.2016.64053
Challenges Analyzing RNA-Seq Gene
Expression Data
Liliana López-Kleine, Cristian González-Prieto
Department of Statistics, Universidad Nacional de Colombia—Sede Bogotá, Bogotá, Colombia
Received 25 June 2016; accepted 16 August 2016; published 19 August 2016
Copyright © 2016 by authors and Scientific Research Publishing Inc.
This work is licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0/
Abstract
The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se-
quencing) data is very challenging. Once technical difficulties have been sorted, an important
choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA-
sequencing count data to a continuous variable or continue to work with count data. For each data
type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis
of data distribution and structure, are a discussion worth. In this review, open questions regarding
RNA-sequencing data nature are discussed and highlighted, indicating important future research
topics in statistics that should be addressed for a better analysis of already available and new ap-
pearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed
data is presented. This comparison indicates that transforming RNA-seq count data seems appro-
priate, at least for differential expression detection.
Keywords
RNA-Seq Analysis, Count Data, Preprocessing, Differential Expression, Gene Co-Expression
Network
1. Introduction
This sequencing of messenger RNA transcripts (RNA-seq) is a recently developed approach to gene expression
or transcriptome profiling that uses deep-sequencing technologies. Studies using this method have allowed as-
sessing the complexity of transcriptomes. RNA-seq also provides more precise measurement of levels of tran-
scripts and their isoforms than other methods based on hybridization (such as microarrays), that were used pre-
viously, but poses also new challenges [1]. Great issues concerning the identification of the real number of RNA
fragments taking into account isoforms, mitochondrial and ribosomal RNA have appear but are beyond the in-
terest of this review. Several satisfactory developments assure a good characterization of RNA-seq transcripts