PIER: Protein Interface Recognition for Structural Proteomics Irina Kufareva, 1 Levon Budagyan, 2 Eugene Raush, 2 Maxim Totrov, 2 and Ruben Abagyan 1,2 * 1 Scripps Research Institute, La Jolla, California 92037 2 Molsoft LLC, La Jolla, California 92037 ABSTRACT Recent advances in structural proteomics call for development of fast and reli- able automatic methods for prediction of func- tional surfaces of proteins with known three- dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant pro- gress the problem is still far from being solved. Most existing methods rely, at least partially, on ev- olutionary information from multiple sequence alignments projected on protein surface. The com- mon drawback of such methods is their limited applicability to the proteins with a sparse set of se- quential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical prop- erties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% re- call expected from random residue function assign- ment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300- residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction per- formance on the benchmark; moreover, for certain classes of proteins, using this signal actually re- sulted in a deteriorated prediction. Thorough bench- marking using other datasets from literature showed that PIER yielded improved performance as com- pared with several alignment-free or alignment- dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects. Proteins 2007;67: 400–417. V V C 2007 Wiley-Liss, Inc. Key words: protein–protein interaction; structural proteomics; cell signaling and protein recognition; structure–function annota- tion; alignment-independent interface prediction INTRODUCTION As crystallographers continue producing novel protein structures with fully or partially unknown function, the question arises of what aspects of their biological function can be predicted from those structures. Predicting the propensity of a protein to form complexes with other pro- teins, the location of the interfaces, and possible oligo- meric states 1,2 is of particular importance because of the role of protein interactions and associations in molecular biology. 3,4 While modern docking algorithms are getting better at predicting protein association geometries (see Refs. 5–9 for reviews), they can only be used when identi- ties and three-dimensional structures of all partners are known; and even for those cases, the prediction is further complicated by the induced fit, incompleteness or inad- equate quality of available structures, and computer re- quirements. Most often, however, we either do not know what the second protein is, or do not have its structure. Reliable prediction of protein binding interfaces from a single protein with a known 3D structure, therefore, be- comes a key computational problem. Existing methods for protein interface prediction can be divided into two classes: (i) methods incorporating evolu- tionary information in the form of certain conservation measures derived from multiple sequence alignments Abbreviations: ASA, solvent accessible surface area; CAPRI, criti- cal assessment of prediction of interactions; MSA, multiple sequence alignment; NEIT, nonenzyme-inhibitor transient interaction; ODA, optimal docking area; PDB, protein data bank; PIER, protein inter- face recognition; PLS, partial least squares regression; RMSD, root mean square deviation; SVM, support vector machines. Grant sponsor: NIH; Grant number: 5-R01-GM071872-02. The PIER predictor is available on the web: http://abagyan.scrip- ps.edu/kufareva/pier.cgi. The dataset of 748 protein interfaces with the accompanying information can also be downloaded from this web site. *Correspondence to: Ruben Abagyan, 10550 North Torrey Pines Rd., Mail TPC-28, La Jolla, CA 92037. E-mail: abagyan@scripps.edu Received 9 December 2005; Revised 2 May 2006; Accepted 16 August 2006 Published online 13 February 2007 in Wiley InterScience (www. interscience.wiley.com). DOI: 10.1002/prot.21233 V V C 2007 WILEY-LISS, INC. PROTEINS: Structure, Function, and Bioinformatics 67:400–417 (2007)