Conservation Helps to Identify Biologically Relevant Crystal Contacts William S. J. Valdar 1 and Janet M. Thornton 1,2 * 1 Biomolecular Structure and Modelling Unit, Biochemistry and Molecular Biology Department, University College London, UK 2 Department of Crystallography Birkbeck College, London, UK Some crystal contacts are biologically relevant, most are not. We assess the utility of combining measures of size and conservation to discrimi- nate between biological and non-biological contacts. Conservation and size information is calculated for crystal contacts in 53 families of homo- dimers and 65 families of monomers. Biological contacts are shown to be usually conserved and typically the largest contact in the crystal. A range of neural networks accepting different combinations and encodings of this information is used to answer the following questions: (1) is a given crystal contact biological, and (2) given all crystal contacts in a homodi- mer, which is the biological one? Predictions for (1) are performed on both homodimer and monomer datasets. The best performing neural net- work combined size and conservation inputs. For the homodimers, it cor- rectly classi®ed 48 out of 53 biological contacts and 364 out of 366 non- biological contacts, giving a combined accuracy of 98.3 %. A more robust performance statistic, the phi-coef®cient, which accounts for imbalances in the dataset, gave a value of 0.92. Taking all 535 non-biological contacts from the 65 monomers, this predictor made erroneous classi®cations only 4.3 % of the time. Predictions for (2) were performed on homodimers only. The best performing network achieved a prediction accuracy of 98.1 % using size information alone. We conclude that in answering ques- tion (1) size and conservation combined discriminate biological from non-biological contacts better than either measure alone. For answering question (2), we conclude that in our dataset size is so powerful a discri- minant that conservation adds little predictive bene®t. # 2001 Academic Press Keywords: proteins; crystal contacts; residue conservation; oligomers; interfaces *Corresponding author Introduction Most crystal contacts are artifacts of crystalliza- tion that would not occur in solution or in the physiological state. But some of the observed con- tacts may be biologically relevant. Determining which contacts are biological and which are not is often dif®cult, particularly when, as frequently seems to be the case for entries in the Protein Data Bank (PDB), 1 the oligomeric state of the protein is uncertain or unknown. 2 Biological contacts Biological contacts, which here refer to any site of in vivo recognition between macromolecules, have received more attention than non-biological contacts or comparisons of the two. Biological interfaces have been characterized in terms of their geometric features, such as planarity, shape-com- plementarity and circularity, in terms of their chemistry, such as hydrophobicity, preference for certain amino acid residues, and in terms of resi- due conservation. 3±8 Although a number of studies have sought to predict the location of biological interfaces based on some of these parameters 9,10 or to dock partners (see Sternberg et al., 11 and refer- ences therein), few have attempted to discriminate between biological and non-biological contacts, 12 a E-mail address of the corresponding author: thornton@biochem.ucl.ac.uk Abbreviations used: PDB, Protein Data Bank; PQS, Protein Quaternary Structure database; ASA, accessible surface area; MMS, multiple multimeric states; SLP, single layer perception; MLPx, multilayer perception with x hidden units; RSA, relative accessible surface area; AO, aldehyde oxidoreductase; Doss, diversity of surface scores; MOP, molybdenum hydroxylase; XDH, xanthine dehydrogenase. doi:10.1006/jmbi.2001.5034 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 313, 399±416 0022-2836/01/020399±18 $35.00/0 # 2001 Academic Press