Computational Biology and Chemistry 28 (2004) 281–290
Fast and high precision algorithms for optimization
in large-scale genomic problems
D.I. Mester, Y.I. Ronin, E. Nevo, A.B. Korol
∗
Institute of Evolution, University of Haifa, Haifa 31905, Israel
Received 5 July 2004; received in revised form 16 August 2004; accepted 16 August 2004
Abstract
There are several very difficult problems related to genetic or genomic analysis that belong to the field of discrete optimization in a set
of all possible orders. With n elements (points, markers, clones, sequences, etc.), the number of all possible orders is n!/2 and only one of
these is considered to be the true order. A classical formulation of a similar mathematical problem is the well-known traveling salesperson
problem model (TSP). Genetic analogues of this problem include: ordering in multilocus genetic mapping, evolutionary tree reconstruction,
building physical maps (contig assembling for overlapping clones and radiation hybrid mapping), and others. A novel, fast and reliable hybrid
algorithm based on evolution strategy and guided local search discrete optimization was developed for TSP formulation of the multilocus
mapping problems. High performance and high precision of the employed algorithm named guided evolution strategy (GES) allows verification
of the obtained multilocus orders based on different computing-intensive approaches (e.g., bootstrap or jackknife) for detection and removing
unreliable marker loci, hence, stabilizing the resulting paths. The efficiency of the proposed algorithm is demonstrated on standard TSP
problems and on simulated data of multilocus genetic maps up to 1000 points per linkage group.
© 2004 Elsevier Ltd. All rights reserved.
Keywords: Discrete optimization; Fast algorithm; Multilocus mapping
1. Introduction
The paper is devoted to genomic problems related to uni-
dimensional ordering of many elements such as markers,
clones, SNP sites, etc. With n such elements, the number of all
possible orders will be n!/2, out of which only one is consid-
ered as the true order. Several genetic problems can be for-
mulated as multipoint unidimensional ordering: multilocus
genetic mapping, building physical maps (contig assembling
for overlapping clones and radiation hybrid mapping), and
others. Despite variation among possible optimization crite-
ria, the unidimensional genetic or genomic ordering problems
are quite similar to the well-known challenging traveling
salesperson problem (TSP). More precisely, the multilocus
ordering problem is formally equivalent to the wandering
salesperson problem (WSP). WSP is a particular case of the
traveling salesperson problem in which the salesperson can
∗
Corresponding author. Tel.: +972 48240 449; fax: +972 48246 554.
E-mail address: korol@esti.haifa.ac.il (A.B. Korol).
start wherever he or she wishes and does not have to return
to the starting city after visiting all cities (Papadimitiou and
Steiglitz, 1981). Besides this, the genetic ordering problems
are “unidimensional” WSP (UWSP) because all ordering ele-
ments are placed on one coordinate axis only. Nonetheless, in
our paper we will apply both terms, TSP and UWSP, because
the term TSP is referred to in most genetic articles.
Several authors employed the methods developed in TSP
for genetic and physical mapping (Weeks and Lange, 1987;
Falk, 1992; Mott et al., 1993; Schiex and Gaspin, 1997; Hall
et al., 2001). For example, in mapping the natural assumption
is that for a set of linked loci the true order will be the one
that minimizes the total length of the linkage group.
One of the possibilities in addressing this problem is to re-
cover the marker order from a known matrix d
ij
of pairwise
marker distances. The special case of the problem can con-
tain the restriction on the order of some (anchor) markers. A
primary difficulty in ordering genetic loci using linkage anal-
ysis is the large number of possible orders: even for n ∼50,
it would not be feasible to find the exact solution by direct
1476-9271/$ – see front matter © 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compbiolchem.2004.08.003