A Novel Database of Disulfide Patterns and its Application to the Discovery of Distantly Related Homologs Herman W. T. van Vlijmen 1 , Abhas Gupta 1 , Lakshmi S. Narasimhan 2 and Juswinder Singh 1 * 1 Structural Informatics Group Biogen Inc., 14 Cambridge Center, Cambridge, MA 02142 USA 2 Discovery Technologies, Pfizer Global Research and Development, Ann Arbor Laboratories, Ann Arbor, MI 48105, USA Disulfide bonds are conserved strongly among proteins of related struc- ture and function. Despite the explosive growth of protein sequence data- bases and the vast numbers of sequence search tools, no tool exists to draw relations between the disulfide patterns of homologous proteins. We present a comprehensive database of disulfide bonding patterns and a search method to find proteins with similar disulfide patterns. The dis- ulfide database was constructed using disulfide annotations extracted from SwissProt, and was expanded significantly from 16,736 to 94,499 disulfide-containing domains by an inference method that combines SwissProt annotations with Pfam multiple alignments. To search the data- base, we define a disulfide description, called the disulfide signature, which encodes both spacings between cysteine residues and cysteine con- nectivity. A web tool was developed that allows users to search for related disulfide patterns and for subpatterns resulting from the removal of one or more disulfides from the pattern. We explore the possibility of using disulfide pattern conservation to identify protein homologs that are undetectable by PSI-BLAST. Examples include the homology between a sea anemone antihypertensive/antiviral protein and a sea anemone neurotoxin, and the homology between tick anticoagulant peptide and bovine trypsin inhibitor. In both examples, there is a clear structural simi- larity and a functional relationship. We used the database to find struc- tural homologs for the Cripto CFC domain. The identification of a von Willebrand Factor C (VWFC)-like domain agrees with its functional role and explains mutation data. We believe that the rapid increase in structure determinations arising from structural genomics efforts and advances in mass spectrometry techniques will greatly increase the number of disul- fide annotations. This information will become a valuable resource for structural and functional annotations of proteins. The availability of a searchable disulfide pattern database will thus provide a powerful new addition to existing homolog discovery methods. q 2003 Elsevier Ltd. All rights reserved. Keywords: disulfide; database; protein structure; homology; structural genomics *Corresponding author Introduction Disulfide bridges are ubiquitous to prokaryotic and eukaryotic proteins alike. Formed by the covalent cross-linking of cysteine residues, these structural elements are found mostly in non-reduc- ing environments, 1,2 and have been shown to pro- vide significant stabilization to the tertiary folds of proteins. 3–7 The stabilizing effect of disulfides on a 0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. E-mail address of the corresponding author: juswinder.singh@biogenidec.com Abbreviations used: rTAP, recombinant tick anticoagulant protein; BPTI, bovine pancreatic trypsin inhibitor; EGF, epidermal growth factor; VWFC, von Willebrand factor C. doi:10.1016/j.jmb.2003.10.077 J. Mol. Biol. (2004) 335, 1083–1092