Open Journal of Statistics, 2016, 6, 628-636 Published Online August 2016 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/10.4236/ojs.2016.64053 How to cite this paper: López-Kleine, L. and González-Prieto, C. (2016) Challenges Analyzing RNA-Seq Gene Expression Data. Open Journal of Statistics, 6, 628-636. http://dx.doi.org/10.4236/ojs.2016.64053 Challenges Analyzing RNA-Seq Gene Expression Data Liliana López-Kleine, Cristian González-Prieto Department of Statistics, Universidad Nacional de Colombia—Sede Bogotá, Bogotá, Colombia Received 25 June 2016; accepted 16 August 2016; published 19 August 2016 Copyright © 2016 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/ Abstract The analysis of messenger Ribonucleic acid obtained through sequencing techniques (RNA-se- quencing) data is very challenging. Once technical difficulties have been sorted, an important choice has to be made during pre-processing: Two different paths can be chosen: Transform RNA- sequencing count data to a continuous variable or continue to work with count data. For each data type, analysis tools have been developed and seem appropriate at first sight, but a deeper analysis of data distribution and structure, are a discussion worth. In this review, open questions regarding RNA-sequencing data nature are discussed and highlighted, indicating important future research topics in statistics that should be addressed for a better analysis of already available and new ap- pearing gene expression data. Moreover, a comparative analysis of RNAseq count and transformed data is presented. This comparison indicates that transforming RNA-seq count data seems appro- priate, at least for differential expression detection. Keywords RNA-Seq Analysis, Count Data, Preprocessing, Differential Expression, Gene Co-Expression Network 1. Introduction This sequencing of messenger RNA transcripts (RNA-seq) is a recently developed approach to gene expression or transcriptome profiling that uses deep-sequencing technologies. Studies using this method have allowed as- sessing the complexity of transcriptomes. RNA-seq also provides more precise measurement of levels of tran- scripts and their isoforms than other methods based on hybridization (such as microarrays), that were used pre- viously, but poses also new challenges [1]. Great issues concerning the identification of the real number of RNA fragments taking into account isoforms, mitochondrial and ribosomal RNA have appear but are beyond the in- terest of this review. Several satisfactory developments assure a good characterization of RNA-seq transcripts