Maximum Contact Map Overlap Revisited RUMEN ANDONOV, 1 NOE ¨ L MALOD-DOGNIN, 1 and NICOLA YANEV 2 ABSTRACT Among the measures for quantifying the similarity between three-dimensional (3D) protein structures, maximum contact map overlap (CMO) received sustained attention during the past decade. Despite this, the known algorithms exhibit modest performance and are not applicable for large-scale comparison. This article offers a clear advance in this respect. We present a new integer programming model for CMO and propose an exact branch-and- bound algorithm with bounds obtained by a novel Lagrangian relaxation. The efficiency of the approach is demonstrated on a popular small benchmark (Skolnick set, 40 domains). On this set, our algorithm significantly outperforms the best existing exact algorithms. Many hard CMO instances have been solved for the first time. To further assess our approach, we constructed a large-scale set of 300 protein domains. Computing the similarity measure for any of the 44850 pairs, we obtained a classification in excellent agreement with SCOP. Supplementary Material is available at www.liebertonline.com/cmb. Key words: branch-and-bound, combinatorial optimization, contact map overlap, integer pro- gramming, Lagrangian relaxation, protein structure alignment. 1. INTRODUCTION A fruitful assumption in molecular biology is that proteins sharing close three dimensional (3D) structures are likely to share a common function and in most cases derive from a same ancestor. Computing the similarity between two protein structures is therefore a crucial task and has been extensively investigated (Agarwal et al., 2007; Caprara and Lancia, 2002; Caprara et al., 2004; Carr and Lancia, 2004; Lancia and Istrail, 2004; Gibrat et al., 1996; Godzik, 1996; Godzik and Skolnick, 1994). Since it is not clear what quantitative measure to use for comparing protein structures, a multitude of measures have been proposed. Each measure aims at capturing the intuitive notion of similarity. We study here the contact-map- overlap (CMO), a scoring scheme first proposed by Godzik and Skolnick (1994). This measure is robust, takes partial matching into account, is translation-invariant, and captures the intuitive notion of similarity very well. The protein’s primary structure is the linear arrangement of its residues (amino acids). Under specific physiological conditions, this linear arrangement will fold and adopt a complex 3D shape, called tertiary structure. In this folded state, residues that are far away along the linear arrangement may come into proximity in 3D space and form contacts. This proximity relation is captured by a contact map. The contact 1 INRIA Rennes–Bretagne Atlantique, and University of Rennes 1, Rennes, France. 2 Faculty of Mathematics and Informatics, University of Sofia, and Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria. JOURNAL OF COMPUTATIONAL BIOLOGY Volume 18, Number 1, 2011 # Mary Ann Liebert, Inc. Pp. 27–41 DOI: 10.1089/cmb.2009.0196 27