Algorithmica (1999) 25: 330–346
Algorithmica
© 1999 Springer-Verlag New York Inc.
An Algorithm for Identifying Similar Amino Acid
Clusters among Different Alpha-Helical Coiled-Coil
Proteins Using Their Secondary Structure
1
T. C. Ip,
2
V. A. Fischetti,
3
and J. P. Schmidt
2, 4
Abstract. We describe a simple approach for finding identical amino acid clusters on the outer surface of
α-helical coiled-coil proteins by examining the sequence of amino acids that compose the protein. Finding such
similarities is an important immunological problem, since these may correspond to cross-reactive epitopes,
i.e., sites at which antibodies produced against one protein also bind to another conformationally similar
protein. Because of the regularities inherent in a coiled-coil structure the position of each amino acid on the
structure is predicted. Based on this prediction, our algorithm finds similarities on the outer surface of the
proteins. The matches found by our algorithm serve as an important screening process, intended to indicate
which experiments to conduct to determine sites that correspond to cross-reactive epitopes. The location of
several cross-reactive epitopes between M proteins and myosins had been verified experimentally. Although
our approach makes many simplifying assumptions, these epitopes always correspond to clusters of identical
amino acids, which our algorithm predicted to be contiguous on the outer surface. Our algorithm runs in
O(n + m + r ) time and O(n + m) space, where n and m are the lengths of the protein sequences, and r is the
number of matching amino acids that appear in the same structural position of the α-helix in both sequences.
Key Words. Computational biology, Protein structure, Coiled-coil, Epitope, Dynamic programming.
1. Introduction. We describe the design and application of a simple and fast algorithm
to identify identical clusters on the outer surface of two proteins whose conformations
consist of a known recurring motif, but whose three-dimensional coordinates are not
known. A program based on our algorithm can be found at http://catt.poly.edu/∼jps. Our
approach is particularly well suited to coiled-coil proteins, because the regularity and
simplicity of their structure lends itself to a simplified analysis. The characterization of
the coiled-coil was first described by Crick [C], and has since been extensively analyzed
and recognized as the predominant feature in many proteins, as, for example, in [OKKA],
[LD], [CP], [FLSS], and [BW]. It was observed that the sequences of amino acids that
fold into a coiled-coil structure contain a seven residue repeat of the form (a-b-c-d -
e- f -g)
n
, where amino acids in positions a and d are hydrophobic and the intervening
ones have a high helix potential, corresponding to amino acids frequently found in alpha
helical structures. The coiled-coil structure consists of two coils that wrap around each
1
This work was supported in part by NSF Grants CCR-9305873 and HRD-9627109, and was in part carried
out while J. P. Schmidt was a visiting Professor at Stanford University.
2
Department of Computer Science, Polytechnic University, 6 MetroTech, Brooklyn, NY 11201, USA.
takip@photon.poly.edu.; jps@pucs4.poly.edu.
3
The Rockefeller University, New York, NY 10021, USA. vaf@rockvax.rockefeller.edu.
4
Current address: Incyte Pharmaceuticals: 3174 Porter Drive Palo Alto, CA 94304, USA. jschmidt@
incyte.com.
Received June 7, 1997; revised March 23, 1998. Communicated by D. Gusfield and M.-Y. Kao.