PSCC: Sensitive and Reliable Population-Scale Copy Number Variation Detection Method Based on Low Coverage Sequencing Xuchao Li 1. , Shengpei Chen 1,4. , Weiwei Xie 1. , Ida Vogel 2 , Kwong Wai Choy 3 , Fang Chen 1 , Rikke Christensen 2 , Chunlei Zhang 1 , Huijuan Ge 1 , Haojun Jiang 1,4 , Chang Yu 1 , Fang Huang 5 , Wei Wang 1,6 , Hui Jiang 1 *, Xiuqing Zhang 1,7 * 1 BGI-Shenzhen, Shenzhen, China, 2 Department of Clinical Genetics, Aarhus University Hospital, Aarhus, Denmark, 3 Department of Obstetrics and Gynaecology, The Chinese University of Hong Kong, Shatin, NT, Hong Kong, 4 State Key Laboratory of Bioelectronics, School of Biological Science and Medical Engineering, Southeast University, Nanjing, China, 5 Guangzhou Children’s Social Welfare Home, Guangzhou, China, 6 Clinical laboratory of BGI Health, Shenzhen, China, 7 The Guangdong Enterprise Key Laboratory of Human Disease Genomics, BGI-Shenzhen, Shenzhen, China Abstract Background: Copy number variations (CNVs) represent an important type of genetic variation that deeply impact phenotypic polymorphisms and human diseases. The advent of high-throughput sequencing technologies provides an opportunity to revolutionize the discovery of CNVs and to explore their relationship with diseases. However, most of the existing methods depend on sequencing depth and show instability with low sequence coverage. In this study, using low coverage whole-genome sequencing (LCS) we have developed an effective population-scale CNV calling (PSCC) method. Methodology/Principal Findings: In our novel method, two-step correction was used to remove biases caused by local GC content and complex genomic characteristics. We chose a binary segmentation method to locate CNV segments and designed combined statistics tests to ensure the stable performance of the false positive control. The simulation data showed that our PSCC method could achieve 99.7%/100% and 98.6%/100% sensitivity and specificity for over 300 kb CNV calling in the condition of LCS (,2 6 ) and ultra LCS (,0.2 6 ), respectively. Finally, we applied this novel method to analyze 34 clinical samples with an average of 2 6 LCS. In the final results, all the 31 pathogenic CNVs identified by aCGH were successfully detected. In addition, the performance comparison revealed that our method had significant advantages over existing methods using ultra LCS. Conclusions/Significance: Our study showed that PSCC can sensitively and reliably detect CNVs using low coverage or even ultra-low coverage data through population-scale sequencing. Citation: Li X, Chen S, Xie W, Vogel I, Choy KW, et al. (2014) PSCC: Sensitive and Reliable Population-Scale Copy Number Variation Detection Method Based on Low Coverage Sequencing. PLoS ONE 9(1): e85096. doi:10.1371/journal.pone.0085096 Editor: Ali Torkamani, The Scripps Research Institute, United States of America Received August 8, 2013; Accepted November 22, 2013; Published January 21, 2014 Copyright: ß 2014 Li et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This study was funded by Key Laboratory Project in Shenzhen (CXB200903110066A and CXB201108250096A). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: jianghui@genomics.org.cn (Hui Jiang); zhangxq@genomics.org.cn (XZ) . These authors contributed equally to this work. Introduction Copy number variations (CNV) are known to be an important component of structural variation in the human genome, resulting from a mixture of meiotic recombination, homology-directed and non homologous repair of double-strand breaks, and errors in replication [1]. CNVs contain duplication, deletion and multi- allelic variation events of genetic material 1 kb or larger in size, and might have functional impact through gene expression and dosage [2,3]. It has been reported that CNVs confer high risk for inherited diseases, complex diseases and cancer, such as autism spectrum disorders [4], systemic lupus erythematous [5] and neuroblastoma [6]. Common CNVs represented in more than 1% of the population are defined as copy number polymorphisms (CNP). These polymorphisms may contribute to phenotypic variations and differences in disease susceptibility across different ethnic groups [6,7]. Therefore the detection and population-scale association analysis of CNVs is necessary for the study of migration and evolution, as well as for clinical diagnosis. For the last 10 years, the Array Comparative Genomic Hybridization (aCGH) and Multiplex Ligation Probe Amplifica- tion (MLPA) methods have provided ample literature on the detection of CNVs [8,9,10]. Recently, massive parallel sequencing has begun to offer genome-scale detection of CNVs through high throughput, high-resolution methods. The Paired-End Read Mapping (PEM) strategy was the first sequencing-based strategy to detect CNVs, and is able to identify both insertions and deletions with a resolution at kilobase level by comparing the differences between the mapped read distance and the average library insert size, though it is unable to detect insertions larger PLOS ONE | www.plosone.org 1 January 2014 | Volume 9 | Issue 1 | e85096