ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information Aharon Armon 1 , Dan Graur 2 and Nir Ben-Tal 1 * 1 Department of Biochemistry 2 Department of Zoology George S. Wise Faculty of Life Sciences, Tel Aviv University Ramat Aviv 69978, Israel Experimental approaches for the identi®cation of functionally important regions on the surface of a protein involve mutagenesis, in which exposed residues are replaced one after another while the change in bind- ing to other proteins or changes in activity are recorded. However, practi- cal considerations limit the use of these methods to small-scale studies, precluding a full mapping of all the functionally important residues on the surface of a protein. We present here an alternative approach invol- ving the use of evolutionary data in the form of multiple-sequence align- ment for a protein family to identify hot spots and surface patches that are likely to be in contact with other proteins, domains, peptides, DNA, RNA or ligands. The underlying assumption in this approach is that key residues that are important for binding should be conserved throughout evolution, just like residues that are crucial for maintaining the protein fold, i.e. buried residues. A main limitation in the implementation of this approach is that the sequence space of a protein family may be unevenly sampled, e.g. mammals may be overly represented. Thus, a seemingly conserved position in the alignment may re¯ect a taxonomically uneven sampling, rather than being indicative of structural or functional import- ance. To avoid this problem, we present here a novel methodology based on evolutionary relations among proteins as revealed by inferred phylo- genetic trees, and demonstrate its capabilities for mapping binding sites in SH2 and PTB signaling domains. A computer program that implements these ideas is available freely at: http://ashtoret.tau.ac.il/ rony # 2001 Academic Press Keywords: molecular recognition; protein-protein interactions; protein modeling; phylogenetic trees *Corresponding author Introduction Mutual interactions between proteins and between proteins and peptides, nucleic acids or ligands play a vital role in every biological process. Thus, detailed understanding of the mechanism of these processes requires the identi®cation of func- tionally important amino acids at the protein sur- face that mediate these interactions. Studies to determine the three-dimensional (3D) structure of protein complexes are useful to single out residues at protein-protein interfaces that are functionally important. However, it is often dif®cult to deter- mine the 3D structure of protein complexes, and often only the structures of the unbound proteins (or domains) are available. In such cases, it is com- mon to carry out tedious mutagenesis studies to determine functionally important residues. How- ever, because of the amount of work required for such an approach, a number of entries in the RCSB Protein Data Bank 1 exist, for which we have only partial information about the function; for example, we may know that a certain protein is a kinase without being able to map the exact location of its active site. The fraction of such entries is expected to increase rapidly due to the different structural genomics initiatives. 2,3 An alternative method to identify functionally important residues in proteins of known 3D E-mail address of the corresponding author: bental@ashtoret.tau.ac.il Abbreviations used: MSA, multiple sequence alignment; ConSurf, consevation surface mapping; PTB, phosphotyrosine binding; rmsd, root-mean-square deviation. doi:10.1006/jmbi.2001.4474 available online at http://www.idealibrary.com on J. Mol. Biol. (2001) 307, 447±463 0022-2836/01/010447±17 $35.00/0 # 2001 Academic Press