A Novel Database of Disulﬁde Patterns and its Application to the Discovery of Distantly Related Homologs Herman W. T. van Vlijmen 1 , Abhas Gupta 1 , Lakshmi S. Narasimhan 2 and Juswinder Singh 1 * 1 Structural Informatics Group Biogen Inc., 14 Cambridge Center, Cambridge, MA 02142 USA 2 Discovery Technologies, Pﬁzer Global Research and Development, Ann Arbor Laboratories, Ann Arbor, MI 48105, USA Disulﬁde bonds are conserved strongly among proteins of related struc- ture and function. Despite the explosive growth of protein sequence data- bases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulﬁde patterns of homologous proteins. We present a comprehensive database of disulﬁde bonding patterns and a search method to ﬁnd proteins with similar disulﬁde patterns. The dis- ulﬁde database was constructed using disulﬁde annotations extracted from SwissProt, and was expanded signiﬁcantly from 16,736 to 94,499 disulﬁde-containing domains by an inference method that combines SwissProt annotations with Pfam multiple alignments. To search the data- base, we deﬁne a disulﬁde description, called the disulﬁde signature, which encodes both spacings between cysteine residues and cysteine con- nectivity. A web tool was developed that allows users to search for related disulﬁde patterns and for subpatterns resulting from the removal of one or more disulﬁdes from the pattern. We explore the possibility of using disulﬁde pattern conservation to identify protein homologs that are undetectable by PSI-BLAST. Examples include the homology between a sea anemone antihypertensive/antiviral protein and a sea anemone neurotoxin, and the homology between tick anticoagulant peptide and bovine trypsin inhibitor. In both examples, there is a clear structural simi- larity and a functional relationship. We used the database to ﬁnd struc- tural homologs for the Cripto CFC domain. The identiﬁcation of a von Willebrand Factor C (VWFC)-like domain agrees with its functional role and explains mutation data. We believe that the rapid increase in structure determinations arising from structural genomics efforts and advances in mass spectrometry techniques will greatly increase the number of disul- ﬁde annotations. This information will become a valuable resource for structural and functional annotations of proteins. The availability of a searchable disulﬁde pattern database will thus provide a powerful new addition to existing homolog discovery methods. q 2003 Elsevier Ltd. All rights reserved. Keywords: disulﬁde; database; protein structure; homology; structural genomics *Corresponding author Introduction Disulﬁde bridges are ubiquitous to prokaryotic and eukaryotic proteins alike. Formed by the covalent cross-linking of cysteine residues, these structural elements are found mostly in non-reduc- ing environments, 1,2 and have been shown to pro- vide signiﬁcant stabilization to the tertiary folds of proteins. 3–7 The stabilizing effect of disulﬁdes on a 0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. E-mail address of the corresponding author: juswinder.singh@biogenidec.com Abbreviations used: rTAP, recombinant tick anticoagulant protein; BPTI, bovine pancreatic trypsin inhibitor; EGF, epidermal growth factor; VWFC, von Willebrand factor C. doi:10.1016/j.jmb.2003.10.077 J. Mol. Biol. (2004) 335, 1083–1092