Research Article RECORD: Reference-Assisted Genome Assembly for Closely Related Genomes Krisztian Buza, Bartek Wilczynski, and Norbert Dojer Faculty of Mathematics, Informatics and Mechanics (MIM), University of Warsaw, Banacha 2, 02-097 Warsaw, Poland Correspondence should be addressed to Krisztian Buza; buza@biointelligence.hu Received 18 March 2015; Revised 27 May 2015; Accepted 31 May 2015 Academic Editor: Chun-Yuan Lin Copyright © 2015 Krisztian Buza et al. Tis is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Next-generation sequencing technologies are now producing multiple times the genome size in total reads from a single experiment. Tis is enough information to reconstruct at least some of the diferences between the individual genome studied in the experiment and the reference genome of the species. However, in most typical protocols, this information is disregarded and the reference genome is used. Results. We provide a new approach that allows researchers to reconstruct genomes very closely related to the reference genome (e.g., mutants of the same species) directly from the reads used in the experiment. Our approach applies de novo assembly sofware to experimental reads and so-called pseudoreads and uses the resulting contigs to generate a modifed reference sequence. In this way, it can very quickly, and at no additional sequencing cost, generate new, modifed reference sequence that is closer to the actual sequenced genome and has a full coverage. In this paper, we describe our approach and test its implementation called RECORD. We evaluate RECORD on both simulated and real data. We made our sofware publicly available on sourceforge. Conclusion. Our tests show that on closely related sequences RECORD outperforms more general assisted-assembly sofware. 1. Background Te emergence of population genomic projects leads to an ever growing need for sofware and methods that facili- tate studying closely related organism with next-generation sequencing technologies. Tis includes determination of the genomic sequences of individuals in the presence of the more generic reference genome of the species. Tis task is known as reference-assisted genome assembly and many ongoing research projects depend on the accurate solution for this problem. In recent years, next-generation sequencing technologies have brought us the possibility to simultanously sequence millions of short DNA fragments in a DNA library prepared from almost any biochemical experiment [1]. Great improve- ment in the quality and amount of short reads obtained from a single experiment allowed for development of many more biochemical assays [2] such as MNase-seq [3], DNAse- seq [4], or Chia-Pet [5] in addition to the more standard ChIP-Seq [6] or RNA-seq [7]. Similarly, the next-generation sequencing techniques may be applied to metagenomic sam- ples returning short reads originating from multiple genomes including some potentially unknown species. Importantly, many of these techniques require the prior knowledge of the reference genome of the species for which the experiment was performed. Tis genome sequence is used to map the reads and obtain the fnal readout of the experiment as the read counts per base pair. Such procedures are guaranteed to work very well only under the assumption that we know the exact sequence of the genome under study. Tere are, however, many biologically relevant cases when this assumption cannot be satisfed. For example, in quickly growing cell populations such as cancer cell-lines or micro- bial colonies, even rare mutations can get fxed in the pop- ulation very quickly. Tis leads to situations where sampled sequences can signifcantly difer from the original reference genome. Similarly, many lab experiments involve genetically modifed cells or organisms. While these modifcations are usually controlled as much as possible, the researchers fre- quently do not know the exact landing site of the introduced Hindawi Publishing Corporation International Journal of Genomics Volume 2015, Article ID 563482, 10 pages http://dx.doi.org/10.1155/2015/563482