NPGe, a new tool for closely related genomes alignment and analysis Boris Nagaev 1,2 , Maxim Nikolaev 2 , Andrei Alexeevski 1,2,3 1 Moscow State University, A.N. Belozersky Institute, Leninskye gory 1-40, Moscow 119992, Russia 2 Moscow State University, Faculty of Bioengineering and Bioinformatics, Leninskye gory 1-73, Moscow 119234, Russia 3 Scientific Research Institute for System Studies, the Russian Academy of Science (NIISI RAS), Moscow 117281, Russia. aba@belozersky.msu.ru Alignment of highly similar genomic sequences is required for several purposes. First, alignment of genomes of closely related prokaryotes allows reconstructing evolutionary events and improving gene annotations. Second, alignment of highly similar genomic sequences is useful for comparison of genome assemblies of the same organism or closely related ones. The alignment allows improving assembly in a number of cases. Particular interest is in comparison of assemblies of genomes with divergent (10 – 15% of SNPs) haplotypes. As a result of high sequence similarity orthologous fragments of genomic sequences can be almost unambiguously determined just by sequence identity percent above certain threshold (e.g. 90%). Here we consider this property the definition of closely related genomes. Ideally all differences between highly similar genomes might be detected up to a couple dozens of nucleotides. Practically, available multiple genome aligners (progressiveMAUVE, VISTA, etc,) are universal and do not explore high sequence similarity on a full scale. This is why we developed a new tool designed primarily for multiple alignments of genomes of closely related organisms. The goals of our work was to develop tools for detailed analysis of evolutionary events and for comparison of genome annotations. To achieve the goals as perfect genome alignments as possible are needed. We specify the definition of multiple genome alignment [1] for the case of closely related genomes and call it ‘a nucleotide pangenome’. Definition. Nucleotide pangenome (NPG) of an input set of genomes is a set of aligned blocks, each block composed of orthologous fragments. A block may contain fragments