Hamming Distance as a Concept in DNA Molecular Recognition
Mina Mohammadi-Kambs,*
,†
Kathrin Hö lz,
‡
Mark M. Somoza,
‡
and Albrecht Ott
†
†
Biological Experimental Physics, Saarland University, Campus B2.1, 66123 Saarbrü cken, Germany
‡
Institute of Inorganic Chemistry, Faculty of Chemistry, University of Vienna, Althanstraße 14 (UZA II), 1090 Vienna, Austria
* S Supporting Information
ABSTRACT: DNA microarrays constitute an in vitro example
system of a highly crowded molecular recognition environment.
Although they are widely applied in many biological applications,
some of the basic mechanisms of the hybridization processes of DNA
remain poorly understood. On a microarray, cross-hybridization
arises from similarities of sequences that may introduce errors during
the transmission of information. Experimentally, we determine an
appropriate distance, called minimum Hamming distance, in which
the sequences of a set differ. By applying an algorithm based on a
graph-theoretical method, we find large orthogonal sets of sequences
that are sufficiently different not to exhibit any cross-hybridization.
To create such a set, we first derive an analytical solution for the
number of sequences that include at least four guanines in a row for a
given sequence length and eliminate them from the list of candidate
sequences. We experimentally confirm the orthogonality of the largest possible set with a size of 23 for the length of 7. We
anticipate our work to be a starting point toward the study of signal propagation in highly competitive environments, besides its
obvious application in DNA high throughput experiments.
■
INTRODUCTION
Molecular recognition in the crowded environment of DNA
microarrays plays an important role in processing information.
Recognition often requires the discrimination of one specific
molecule among many similar, competing molecules. In 1894,
Emil Fischer proposed the lock and key model to describe the
recognition of an enzyme and a substrate.
1
According to this
model, the substrate possesses the perfect size and shape to
accommodate the active site of its complement. However, in
crowded environments, binding between noncomplementary
molecules may occur and result in introduction of errors. For
DNA, specific-binding of two single strands, that is the
formation of a stable double helix, occurs only if the bases A
and T as well as C and G pair along the sequence. DNA
microarrays are a widely used platform that, besides many
applications in medicine and biology, enables the study of the
fundamentals of DNA hybridization.
2-10
These microarrays
consist of single-stranded DNA oligonucleotides immobilized
on a surface (probes). If these probes are exposed to a bulk
mixture of fluorescently labeled target sequences, only
complementary targets are expected to hybridize. However,
hybridization of a probe to a noncomplementary target still
occurs, albeit with a lower binding affinity than the
corresponding perfectly matching sequence. Therefore, sim-
ilarities among probes can lead to a significant amount of
nonspecific cross-hybridization. On a DNA microarray with
complex target mixtures, imperfect recognition introduces noise
and makes results difficult to interpret.
The kinetics of hybridization in the presence of competitors
and the importance of cross-hybridization for quantitative
interpretation of microarray data have been intensely
studied,
11-13
especially for the purpose of single nucleotide
polymorphism detection and the accurate assessment of gene
expression levels.
14-17
One strategy to avoid cross-hybrid-
ization is to construct sets of probes with minimized pairwise
competition so that they do not cross-hybridize. Such probes
are often referred to as orthogonal. Previous theoretical
research
18-24
developed different strategies to find sets of
orthogonal sequences. The most intuitive approach to decide,
which sequences cross-hybridize, is based on the free energy
difference between the perfectly matched and mismatched
hybridization.
25
However, estimating free energies led to poor
predictions of hybridization intensities on microarrays.
26
In this
work, we apply a well-known local search algorithm and
implement graph-theoretical methods to find such sets.
Following the concept of Hamming distance from coding
theory, we consider that two sequences do not cross-hybridize
if they differ by at least a certain number of bases. This
threshold is called minimum Hamming distance d.
27
We
determine a suitable d experimentally. One of the fundamental
problems in coding theory is finding the maximum size of a
code, where a code is a set of codewords with the length L and
minimum Hamming distance d.
28
In analogy, here, we
Received: January 14, 2017
Accepted: March 17, 2017
Published: April 5, 2017
Article
http://pubs.acs.org/journal/acsodf
© 2017 American Chemical Society 1302 DOI: 10.1021/acsomega.7b00053
ACS Omega 2017, 2, 1302-1308
This is an open access article published under a Creative Commons Attribution (CC-BY)
License, which permits unrestricted use, distribution and reproduction in any medium,
provided the author and source are cited.