Struc tura l sim ila rity of DNA-binding domains of b a c te rio p ha g e re p re sso rs and the globin core S. Subbiah, D.V. Laurents and M. Levitt Beckman Laboratories for Structural Biology, Departments of Cell Bio lo g y a nd Bio c he m istry, Stanford University School of Medicine, Stanford, California 94305-5307, USA. Background: In recent years, the determination of protein structures in the Brookhaven protein database. large numbers of protein structures has created a need Notably, we 6nd that the DNA-binding domain of the for automatic and objective methods for the comparison bacteriophage repressor family is almost completely em- of structures or conformations. Many protein structures show simkities of conformation that are undetectable bedded in the larger eight-helix fold of the globin family by comparing their sequences. Comparison of struc- of proteins. The significant match of specific residues is tures can reveal similarities between proteins thought correlated with functional, structural and evolutionary in- to be unrelated, providing new insight into the interre- formation. lationships of sequence, structure and function. Conclusion: Our method can help to identify stmc- Results: Using a new tool that we have developed turally similar folds rapidly and with high-sensititity, pro- to Perform rapid structural alignment, we present the vi&g a powerful tool for analyzing the ever-increasing highlights of an exhaustive comparison of all pairs of number of protein structures being elucidated. Current Bio lo g y 1993, 3:141-148 Background Although the number of protein structures deposited in the Brookhaven protein database (PUB) has grown rapidly in recent years [l], the subset of new pro- te in folds has grown at a significantly slower rate [ 21. This rate difference still persists after allowing for the many structural determinations of homologous, mutant and drug-complexed versions in the same basic pro- tein family. Therefore, assuming there is no systematic bias in the selection criteria in deciding which particu- lar protein structure is to be determined, it has been suggested that we are ‘closing-in’ on the complete repertoire of folds that are allowable from the multi- tude that constitute all possible protein structures [3]. The limited number of these folds may be due to evolu- tion: once there are enough folds to create all possible protein functions there is then no pressure to evolve new folds. On the other hand, the limit to the num- ber of folds may be due to the existence of basic structural limitations that dictate, and thus relate, the three-dimensional structures of proteins. Finding and understanding such principles of protein construction will help in the design of new and variant proteins. Assuming that the reservoir of unobserved folds is de- pleting rapidly, any structural constraints should be de- tectable in me structural database presently available to us. Suitable and exhaustive comparisons of these structures against each other could reveal unexpected smarities that could help catalogue and, perhaps, de- -5ne structural principles. m this context, it is worth noting that analogous studies of the one-dimensional DNA and protein sequence databases, made possible by the development of elegant computer algorithms, have borne much fruit in identifying and catalogu ing many novel sequence m o tifs of functional interest [4,5]. With regard to the problem of comparing two diierent three-dimensional protein structures consid- ered here, despite early (and more recently plentiful) work in the development of suitable computer algo- rithms, systematic studies have been limited [6-ll] . Many of the available methods have been hampered by limitations in accuracy, speed and sensitivity. Here we present a new method for protein structure comparison that is accurate, fast and sensitive. Using this improved tool, we present the highlights of an ex- haustive comparison of all pairs of protein structures in the PDB. The discovery of a significant structural similarity between two well-studied protein families, the bacteriophage repressors and the globins, emphasizes the power of our method. With its speed and sensitivity, it can aid the crystallographer and NMR spectroscopist in rapid identification of the relatedness of a newly de- termined structure to all previously reported ones. Such discoveries will in turn help to identify the rules that govern hig he r order structural motifs. Results A lig ning struc ture s Our method aligns two protein structures by starting with an arbitrary equivalence of residues that are super- imposed in three-dimensions. A structural alignment matrix, which is calculated from distances between pairs of residues that are not in the same protein, is searched to achieve the optimal alignment. This gives Correspondence to: M. Levitt @ C urre nt Bio lo g y 1993, VO] 3 NO 3 141