Physica A 389 (2010) 3007–3012 Contents lists available at ScienceDirect Physica A journal homepage: www.elsevier.com/locate/physa Sequence alignment using simulated annealing Ozan S. Sarıyer a,* , Can Güven b a Department of Physics, Koç University, Sarıyer 34450, Istanbul, Turkey b Department of Physics, University of Maryland, College Park, MD 20742, United States article info Article history: Received 5 February 2010 Available online 12 February 2010 Keywords: Bioinformatics Simulated annealing abstract We apply simulated annealing to amino acid sequence alignment, a fundamental problem in bioinformatics, particularly relevant to evolution. Our goal was obtaining results comparable to those reached through dynamic programming algorithms, like the Needleman–Wunsch algorithm, as well as making a connection between physics and bioinformatics through a representative example. © 2010 Elsevier B.V. All rights reserved. 1. Introduction The Metropolis algorithm introduced in Ref. [1] was developed by Kirkpatrick et al. in Ref. [2], which reveals connections between statistical mechanics and combinatorial optimization by introducing a temperature-like variable that gives rise to efficient search for global optimum. There exist numerous reviews about simulated annealing that argue the algorithm in deep detail [3–7]. Some main problems of bioinformatics, onto which the simulated annealing methods have been applied during the last two decades, include phylogenetic tree search [8], homology modeling [9], improvement of threading-based protein models [10], secondary structure alignment [11,12], tertiary structure prediction [13–15], RNA/DNA/protein multiple/pair sequence alignments [11,16–21]. In this paper, we investigated if and how simulated annealing can be applied onto sequence alignment problem, for which the widely accepted method of solution is an application of dynamic programming, namely the Needleman–Wunsch algorithm [22] of time complexity O(N 2 ) for aligning two sequences both of length N . We studied the case of equal sequence lengths for simplicity, while the procedure can be well generalized to different sequence lengths. Our time complexity analysis suggests simulated annealing being better than the Needleman–Wunsch algorithm for sequences of lengths longer than median protein lengths, for which the optimal alignment cost deviation saturates to a fair value. It should be noted that the Needleman–Wunsch algorithm yields the exact optimal alignment, but cannot be extended to multiple sequence alignment, while this extension can be easily implemented for simulated annealing. 2. Simulated annealing algorithm The idea of annealing, a technique that is used in metallurgy, can be exploited to optimize a more general system. The annealing method in metal processing starts with raising the temperature to a very high level, where the crystal structure of atoms breaks down, and the atoms can rearrange. As the temperature is lowered the atoms tend to form the optimal crystalline structure, since this type of structure allows the physical lowest-energy configuration. Given enough relaxation time to the impurities in the system, they are placed in such a way that they do not cause frustration. The slow rate of cooling is crucial, because the kinetic energy, which is directly related to the temperature, is responsible for the motion of atoms. * Corresponding author. E-mail address: sariyer@itu.edu.tr (O.S. Sarıyer). 0378-4371/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.physa.2010.02.015