Analysis of Covariation in an SH3 Domain Sequence Alignment: Applications in Tertiary Contact Prediction and the Design of Compensating Hydrophobic Core Substitutions Stefan M. Larson 1 , Ariel A. Di Nardo 2 and Alan R. Davidson 1,2 * 1 Department of Molecular and Medical Genetics, University of Toronto, Toronto, Ontario Canada, M5S 1A8 2 Department of Biochemistry University of Toronto, Toronto Ontario, Canada, M5S 1A8 We have analyzed sequence covariation in an alignment of 266 non- redundant SH3 domain sequences using chi-squared statistical methods. Artifactual covariations arising from close evolultionary relationships among certain sequence subgroups were eliminated using empirically derived sequence diversity thresholds. This covariation detection method was able to predict residue-residue contacts (side-chain centres of mass within 8 A Ê ) in the structure of the SH3 domain with an accuracy of 85 %, which is greater than that achieved in many previous covariation studies. In examining the positions involved most frequently in covariations, we discovered a dramatic over-representation of a subset of ®ve hydrophobic core positions. This covariation information was used to design second and third site substitutions that could compensate for highly destabilizing hydrophobic core substitutions in the Fyn SH3 domain, thus providing experimental data to validate the covariation analysis. The testing of our covariation detection method on 15 other alignments showed that the accuracy of contact prediction is highly variable depending on which sequence alignment is used, and useful levels of prediction accuracy were obtained with only approximately one-third of alignments. The results presented here provide insight into the dif®culties inherent in covariation analysis, and suggest that it may have limited usefulness in tertiary structure prediction. On the other hand, our ability to use covar- iation analysis to design stabilizing combinations of hydrophobic core substitutions attests to its potential utility for gaining deeper insight into the stability determinants and functional mechanisms of proteins with known three-dimensional structures. # 2000 Academic Press Keywords: covariation; SH3 domain; sequence alignment; contact prediction; hydrophobic core *Corresponding author Introduction To gain full bene®t from the rapidly increasing volume of data contained in the protein sequence databases, the development of rigorously tested techniques for the extraction of useful information from sequence alignments is essential. It is well established that sequence alignments can be used to identify crucial positions for the function and/or stability of a protein (Bashford et al., 1987; Chothia, 1975), and improve structure prediction (Rost & Sander, 1995; Gerloff et al., 1997). In addition, knowledge of the frequencies of occurrence of amino acid residues at each position in a sequence alignment can aid in the prediction of the effects of site-directed mutations (Maxwell & Davidson, 1998; Steipe et al., 1994). However, other infor- mation contained within sequence alignments can be more dif®cult to quantify. Covariation analysis, which involves the detec- tion of positions within an alignment that mutate in a correlated manner, can potentially provide further insight into the meaning of conservation patterns seen in sequence alignments. Covariation occurs in a sequence alignment because certain mutations that debilitate the function of a protein can only persist through evolution if they are E-mail address of the corresponding author: davidson@hkl.med.utoronto.ca doi:10.1006/jmbi.2000.4146 available online at http://www.idealibrary.com on J. Mol. Biol. (2000) 303, 433±446 0022-2836/00/030433±14 $35.00/0 # 2000 Academic Press