Computational Biology and Chemistry 28 (2004) 281–290 Fast and high precision algorithms for optimization in large-scale genomic problems D.I. Mester, Y.I. Ronin, E. Nevo, A.B. Korol Institute of Evolution, University of Haifa, Haifa 31905, Israel Received 5 July 2004; received in revised form 16 August 2004; accepted 16 August 2004 Abstract There are several very difficult problems related to genetic or genomic analysis that belong to the field of discrete optimization in a set of all possible orders. With n elements (points, markers, clones, sequences, etc.), the number of all possible orders is n!/2 and only one of these is considered to be the true order. A classical formulation of a similar mathematical problem is the well-known traveling salesperson problem model (TSP). Genetic analogues of this problem include: ordering in multilocus genetic mapping, evolutionary tree reconstruction, building physical maps (contig assembling for overlapping clones and radiation hybrid mapping), and others. A novel, fast and reliable hybrid algorithm based on evolution strategy and guided local search discrete optimization was developed for TSP formulation of the multilocus mapping problems. High performance and high precision of the employed algorithm named guided evolution strategy (GES) allows verification of the obtained multilocus orders based on different computing-intensive approaches (e.g., bootstrap or jackknife) for detection and removing unreliable marker loci, hence, stabilizing the resulting paths. The efficiency of the proposed algorithm is demonstrated on standard TSP problems and on simulated data of multilocus genetic maps up to 1000 points per linkage group. © 2004 Elsevier Ltd. All rights reserved. Keywords: Discrete optimization; Fast algorithm; Multilocus mapping 1. Introduction The paper is devoted to genomic problems related to uni- dimensional ordering of many elements such as markers, clones, SNP sites, etc. With n such elements, the number of all possible orders will be n!/2, out of which only one is consid- ered as the true order. Several genetic problems can be for- mulated as multipoint unidimensional ordering: multilocus genetic mapping, building physical maps (contig assembling for overlapping clones and radiation hybrid mapping), and others. Despite variation among possible optimization crite- ria, the unidimensional genetic or genomic ordering problems are quite similar to the well-known challenging traveling salesperson problem (TSP). More precisely, the multilocus ordering problem is formally equivalent to the wandering salesperson problem (WSP). WSP is a particular case of the traveling salesperson problem in which the salesperson can Corresponding author. Tel.: +972 48240 449; fax: +972 48246 554. E-mail address: korol@esti.haifa.ac.il (A.B. Korol). start wherever he or she wishes and does not have to return to the starting city after visiting all cities (Papadimitiou and Steiglitz, 1981). Besides this, the genetic ordering problems are “unidimensional” WSP (UWSP) because all ordering ele- ments are placed on one coordinate axis only. Nonetheless, in our paper we will apply both terms, TSP and UWSP, because the term TSP is referred to in most genetic articles. Several authors employed the methods developed in TSP for genetic and physical mapping (Weeks and Lange, 1987; Falk, 1992; Mott et al., 1993; Schiex and Gaspin, 1997; Hall et al., 2001). For example, in mapping the natural assumption is that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. One of the possibilities in addressing this problem is to re- cover the marker order from a known matrix d ij of pairwise marker distances. The special case of the problem can con- tain the restriction on the order of some (anchor) markers. A primary difficulty in ordering genetic loci using linkage anal- ysis is the large number of possible orders: even for n 50, it would not be feasible to find the exact solution by direct 1476-9271/$ – see front matter © 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2004.08.003