1 Scientific RepoRts | 6:32942 | DOI: 10.1038/srep32942 www.nature.com/scientificreports CRIspRdigger: detecting CRIspRs with better direct repeat annotations Ruiquan Ge 1,2,* , Guoqin Mai 1,3,* , pu Wang 1,2 , Manli Zhou 1,2 , Youxi Luo 4 , Yunpeng Cai 1 & Fengfeng Zhou 5,6 Clustered regularly interspaced short palindromic repeats (CRIspRs) are important genetic elements in many bacterial and archaeal genomes, and play a key role in prokaryote immune systems’ ight against invasive foreign elements. the CRIspR system has also been engineered to facilitate target gene editing in eukaryotic genomes. Using the common features of mis-annotated CRIspRs in prokaryotic genomes, this study proposed an accurate de novo CRIspR annotation program CRIspRdigger, which can take a partially assembled genome as its input. A comprehensive comparison with the three existing programs demonstrated that CRIspRdigger can recover more Direct Repeats (DRs) for CRIspRs and achieve a higher accuracy for a query genome. the program was implemented by perl and all the parameters had default values, so that a user could annotate CRIspRs in a query genome by supplying only a genome sequence in the FAstA format. All the supplementary data are available at http://www. healthinformaticslab.org/supp/. Clustered regularly interspaced short palindromic repeats (CRISPRs) are essential genetic factors in prokaryotic genomes 1 , and actively acquire template sequences from invasive elements such as phages for sequence-speciic cut later on 2,3 . hese template foreign sequences vary in length from 24 to 48 bps, gapped by conserved repeats 4,5 . A CRISPR is usually transcribed by a neighbouring CRISPR-associated (Cas) gene binding to its leader region on the closely lanking region 5 . he CRISPR/Cas system serves as an anti-invasion immune mechanism in over 40% of sequenced prokaryotic genomes 6 . he CRISPR/Cas system is attracting considerable attention as a eukaryotic genome-editing technology 7–9 because it cuts speciic sequence signals 10 . he most well-developed enzyme is the nuclease Cas9 from the bacte- rium Streptococcus pyogenes 11 , and it may lead the degenerative cut to any genomic locations with an appropriate guide RNA fragment 12 . Another widely-used genome editing technology, TALEN 13 , requires the researcher to synthesize a nuclease for each target genomic location, which is much more costly and time-consuming than the synthesis of only a guide RNA 14 . With the increasing target speciicity requested from the clinical applications, a number of Cas9 mutants have been introduced with over 50-fold higher speciicity 15,16 . A few computer programs have been developed for the de novo detection of natural CRISPRs in prokaryotic genomes. Ater their discovery in the 1980 s 17 , CRISPRs have been detected in 47.14% of the 2,762 analysed prokaryotic genomes, with 1.47 CRISPRs per genome 6 . However, the prokaryotic genomes are sequenced at an accelerated rate, and 4,278 genomes were included as of 25 September 2015 in the NCBI Microbial Genome data- base 18 . Consequently the de novo annotation of CRISPRs in a newly completed prokaryotic genome is necessary for the better understanding of this immune system. PILER-CR 19 was derived from the repeat detection program PILER 20 , and screens for CRISPRs in a small genome. CRT screens for exact k-mer/k-nucleotide repeats in a genome, and concatenates the neighbouring repeats into candidate CRISPRs 21 . CRISPRFinder uses Vmatch 22 1 Shenzhen institutes of Advanced technology, and Key Lab for Health informatics, chinese Academy of Sciences, Shenzhen, Guangdong 518055, China. 2 Shenzhen college of Advanced technology, University of chinese Academy of Sciences, Shenzhen, Guangdong, 518055, China. 3 center for Synthetic Biology engineering Research, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, Guangdong 518055, China. 4 School of Science, Hubei University of Technology, Wuhan, Hubei, 430068, China. 5 college of computer Science and Technology, Changchun, Jilin, 130012, China. 6 Key Laboratory of Symbolic computation and Knowledge engineering of Ministry of Education, Jilin University, Changchun, Jilin 130012, China. *These authors contributed equally to this work. Correspondence and requests for materials should be addressed to F.Z. (email: FengfengZhou@gmail.com or fzhou@jlu.edu.cn) Received: 20 May 2016 Accepted: 12 August 2016 Published: 06 September 2016 opeN