Structural variation detection using next-generation sequencing data A comparative technical review Peiyong Guan a , Wing-Kin Sung a,b,⇑ a School of Computing, National University of Singapore, 117543, Singapore b Computational & Mathematical Biology Group, Genome Institute of Singapore, 138672, Singapore article info Article history: Received 18 October 2015 Received in revised form 9 January 2016 Accepted 31 January 2016 Available online xxxx Keywords: Structural variation Next-generation sequencing abstract Structural variations (SVs) are mutations in the genome of size at least fifty nucleotides. They contribute to the phenotypic differences among healthy individuals, cause severe diseases and even cancers by breaking or linking genes. Thus, it is crucial to systematically profile SVs in the genome. In the past dec- ade, many next-generation sequencing (NGS)-based SV detection methods have been proposed due to the significant cost reduction of NGS experiments and their ability to unbiasedly detect SVs to the base-pair resolution. These SV detection methods vary in both sensitivity and specificity, since they use different SV-property-dependent and library-property-dependent features. As a result, predictions from different SV callers are often inconsistent. Besides, the noises in the data (both platform-specific sequencing error and artificial chimeric reads) impede the specificity of SV detection. Poorly characterized regions in the human genome (e.g., repeat regions) greatly impact the reads mapping and in turn affect the SV calling accuracy. Calling of complex SVs requires specialized SV callers. Apart from accuracy, processing speed of SV caller is another factor deciding its usability. Knowing the pros and cons of different SV calling tech- niques and the objectives of the biological study are essential for biologists and bioinformaticians to make informed decisions. This paper describes different components in the SV calling pipeline and reviews the techniques used by existing SV callers. Through simulation study, we also demonstrate that library properties, especially insert size, greatly impact the sensitivity of different SV callers. We hope the community can benefit from this work both in designing new SV calling methods and in selecting the appropriate SV caller for specific biological studies. Ó 2016 Elsevier Inc. All rights reserved. Contents 1. Structural variations (SVs) ............................................................................................... 00 2. The SV calling pipeline.................................................................................................. 00 2.1. Data preprocessing ............................................................................................... 00 2.1.1. Reads mapping ........................................................................................... 00 2.1.2. Reads filtering ............................................................................................ 00 2.1.3. Reads classification........................................................................................ 00 2.2. SV discovery .................................................................................................... 00 2.2.1. Direct vs. indirect cases .................................................................................... 00 2.2.2. SV discovery techniques.................................................................................... 00 2.2.3. Hybrid-approach for SV discovery ............................................................................ 00 2.3. SV verification ................................................................................................... 00 2.4. SV annotation ................................................................................................... 00 2.5. SV visualization .................................................................................................. 00 3. SV and library properties impacts SV calling ................................................................................ 00 3.1. SV properties impact SV calling ..................................................................................... 00 http://dx.doi.org/10.1016/j.ymeth.2016.01.020 1046-2023/Ó 2016 Elsevier Inc. All rights reserved. ⇑ Corresponding author at: School of Computing, National University of Singapore, 117543, Singapore. E-mail address: ksung@comp.nus.edu.sg (W.-K. Sung). Methods xxx (2016) xxx–xxx Contents lists available at ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth Please cite this article in press as: P. Guan, W.-K. Sung, Methods (2016), http://dx.doi.org/10.1016/j.ymeth.2016.01.020