Algorithmica (1999) 25: 330–346 Algorithmica © 1999 Springer-Verlag New York Inc. An Algorithm for Identifying Similar Amino Acid Clusters among Different Alpha-Helical Coiled-Coil Proteins Using Their Secondary Structure 1 T. C. Ip, 2 V. A. Fischetti, 3 and J. P. Schmidt 2, 4 Abstract. We describe a simple approach for finding identical amino acid clusters on the outer surface of α-helical coiled-coil proteins by examining the sequence of amino acids that compose the protein. Finding such similarities is an important immunological problem, since these may correspond to cross-reactive epitopes, i.e., sites at which antibodies produced against one protein also bind to another conformationally similar protein. Because of the regularities inherent in a coiled-coil structure the position of each amino acid on the structure is predicted. Based on this prediction, our algorithm finds similarities on the outer surface of the proteins. The matches found by our algorithm serve as an important screening process, intended to indicate which experiments to conduct to determine sites that correspond to cross-reactive epitopes. The location of several cross-reactive epitopes between M proteins and myosins had been verified experimentally. Although our approach makes many simplifying assumptions, these epitopes always correspond to clusters of identical amino acids, which our algorithm predicted to be contiguous on the outer surface. Our algorithm runs in O(n + m + r ) time and O(n + m) space, where n and m are the lengths of the protein sequences, and r is the number of matching amino acids that appear in the same structural position of the α-helix in both sequences. Key Words. Computational biology, Protein structure, Coiled-coil, Epitope, Dynamic programming. 1. Introduction. We describe the design and application of a simple and fast algorithm to identify identical clusters on the outer surface of two proteins whose conformations consist of a known recurring motif, but whose three-dimensional coordinates are not known. A program based on our algorithm can be found at http://catt.poly.edu/jps. Our approach is particularly well suited to coiled-coil proteins, because the regularity and simplicity of their structure lends itself to a simplified analysis. The characterization of the coiled-coil was first described by Crick [C], and has since been extensively analyzed and recognized as the predominant feature in many proteins, as, for example, in [OKKA], [LD], [CP], [FLSS], and [BW]. It was observed that the sequences of amino acids that fold into a coiled-coil structure contain a seven residue repeat of the form (a-b-c-d - e- f -g) n , where amino acids in positions a and d are hydrophobic and the intervening ones have a high helix potential, corresponding to amino acids frequently found in alpha helical structures. The coiled-coil structure consists of two coils that wrap around each 1 This work was supported in part by NSF Grants CCR-9305873 and HRD-9627109, and was in part carried out while J. P. Schmidt was a visiting Professor at Stanford University. 2 Department of Computer Science, Polytechnic University, 6 MetroTech, Brooklyn, NY 11201, USA. takip@photon.poly.edu.; jps@pucs4.poly.edu. 3 The Rockefeller University, New York, NY 10021, USA. vaf@rockvax.rockefeller.edu. 4 Current address: Incyte Pharmaceuticals: 3174 Porter Drive Palo Alto, CA 94304, USA. jschmidt@ incyte.com. Received June 7, 1997; revised March 23, 1998. Communicated by D. Gusfield and M.-Y. Kao.