Vol.:(0123456789) 1 3 Interdisciplinary Sciences: Computational Life Sciences https://doi.org/10.1007/s12539-019-00326-x ORIGINAL RESEARCH ARTICLE Distribution of Distances Between Symmetric Words in the Human Genome: Analysis of Regular Peaks Carlos A. C. Bastos 1  · Vera Afreixo 2  · João M. O. S. Rodrigues 1  · Armando J. Pinho 1  · Raquel M. Silva 3 Received: 24 August 2018 / Revised: 24 January 2019 / Accepted: 27 February 2019 © International Association of Scientists in the Interdisciplinary Areas 2019 Abstract Finding DNA sites with high potential for the formation of hairpin/cruciform structures is an important task. Previous works studied the distances between adjacent reversed complement words (symmetric word pairs) and also for non-adjacent words. It was observed that for some words a few distances were favoured (peaks) and that in some distributions there was strong peak regularity. The present work extends previous studies, by improving the detection and characterization of peak regularities in the symmetric word pairs distance distributions of the human genome. This work also analyzes the location of the sequences that originate the observed strong peak periodicity in the distance distribution. The results obtained in this work may indicate genomic sites with potential for the formation of hairpin/cruciform structures. Keywords Cruciform · Distance distribution · Genomic word · Reversed complements · Inverted repeats · Regular peaks 1 Introduction Several genomic studies have focused on the analysis of word counts and word distances, namely, phylogeny studies [1], alignment-free methods [2, 3], CpG detection [4], cod- ing regions detection [5] and other DNA structure analysis [6, 7]. In the context of DNA structure analysis, non-B confor- mations have been shown to play an important role in DNA damage and repair, genetic instability, gene regulation, and chromatin architecture [8]. In particular, hairpin/cruciforms structures are important regulators for biological processes and gene function [9]. Inverted repeats are a required feature of cruciform struc- tures, but not all inverted repeats will form cruciforms. Cru- ciforms are dynamic structures that may occur when certain conditions are met, such as the coiling state of DNA, but are less stable than the normal B-DNA conformation. Although their properties and relevance in several biological processes are acknowledged, evidence of their genomic localization and mechanism of action are lacking in vivo [10, 11]. The stem and loop lengths of cruciform structures seem to vary over a wide range. According to different authors, the stem lengths vary between 6 and 100 nucleotides, while loop lengths may range from 0 to 2000 nucleotides [1214]. Shorter distances could favour the occurrence of these struc- tures, but long distances have also been reported, such as the translocation breakpoints associated with human develop- mental diseases or infertility [10]. Computational techniques have been used to identify DNA motifs that are known to potentially form non-B DNA structures [6, 14]. A DNA word analysis based on the distri- bution of the distances between adjacent symmetric words of length seven [7] showed a strong over-representation of distances up to 350, a feature that the authors considered might be associated with the potential for the occurrence of cruciform structures. Recently, the same research group extended their analysis to include distance distributions * Carlos A. C. Bastos cbastos@ua.pt 1 Department of Electronics, Telecommunications and Informatics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal 2 Department of Mathematics, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, CIDMA-Center for Research and Development in Mathematics and Applications, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal 3 Department of Medical Sciences, iBiMED, IEETA-Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, Campus Universitário de Santiago, Aveiro, Portugal