Vol.:(0123456789) 1 3
Interdisciplinary Sciences: Computational Life Sciences
https://doi.org/10.1007/s12539-019-00326-x
ORIGINAL RESEARCH ARTICLE
Distribution of Distances Between Symmetric Words in the Human
Genome: Analysis of Regular Peaks
Carlos A. C. Bastos
1
· Vera Afreixo
2
· João M. O. S. Rodrigues
1
· Armando J. Pinho
1
· Raquel M. Silva
3
Received: 24 August 2018 / Revised: 24 January 2019 / Accepted: 27 February 2019
© International Association of Scientists in the Interdisciplinary Areas 2019
Abstract
Finding DNA sites with high potential for the formation of hairpin/cruciform structures is an important task. Previous
works studied the distances between adjacent reversed complement words (symmetric word pairs) and also for non-adjacent
words. It was observed that for some words a few distances were favoured (peaks) and that in some distributions there was
strong peak regularity. The present work extends previous studies, by improving the detection and characterization of peak
regularities in the symmetric word pairs distance distributions of the human genome. This work also analyzes the location
of the sequences that originate the observed strong peak periodicity in the distance distribution. The results obtained in this
work may indicate genomic sites with potential for the formation of hairpin/cruciform structures.
Keywords Cruciform · Distance distribution · Genomic word · Reversed complements · Inverted repeats · Regular peaks
1 Introduction
Several genomic studies have focused on the analysis of
word counts and word distances, namely, phylogeny studies
[1], alignment-free methods [2, 3], CpG detection [4], cod-
ing regions detection [5] and other DNA structure analysis
[6, 7].
In the context of DNA structure analysis, non-B confor-
mations have been shown to play an important role in DNA
damage and repair, genetic instability, gene regulation, and
chromatin architecture [8]. In particular, hairpin/cruciforms
structures are important regulators for biological processes
and gene function [9].
Inverted repeats are a required feature of cruciform struc-
tures, but not all inverted repeats will form cruciforms. Cru-
ciforms are dynamic structures that may occur when certain
conditions are met, such as the coiling state of DNA, but are
less stable than the normal B-DNA conformation. Although
their properties and relevance in several biological processes
are acknowledged, evidence of their genomic localization
and mechanism of action are lacking in vivo [10, 11].
The stem and loop lengths of cruciform structures seem
to vary over a wide range. According to different authors,
the stem lengths vary between 6 and 100 nucleotides, while
loop lengths may range from 0 to 2000 nucleotides [12–14].
Shorter distances could favour the occurrence of these struc-
tures, but long distances have also been reported, such as the
translocation breakpoints associated with human develop-
mental diseases or infertility [10].
Computational techniques have been used to identify
DNA motifs that are known to potentially form non-B DNA
structures [6, 14]. A DNA word analysis based on the distri-
bution of the distances between adjacent symmetric words
of length seven [7] showed a strong over-representation of
distances up to 350, a feature that the authors considered
might be associated with the potential for the occurrence
of cruciform structures. Recently, the same research group
extended their analysis to include distance distributions
* Carlos A. C. Bastos
cbastos@ua.pt
1
Department of Electronics, Telecommunications
and Informatics, IEETA-Institute of Electronics
and Informatics Engineering of Aveiro, University of Aveiro,
Campus Universitário de Santiago, Aveiro, Portugal
2
Department of Mathematics, IEETA-Institute of Electronics
and Informatics Engineering of Aveiro, CIDMA-Center
for Research and Development in Mathematics
and Applications, University of Aveiro, Campus
Universitário de Santiago, Aveiro, Portugal
3
Department of Medical Sciences, iBiMED, IEETA-Institute
of Electronics and Informatics Engineering of Aveiro,
University of Aveiro, Campus Universitário de Santiago,
Aveiro, Portugal