PROCOS: Computational Analysis of Protein–Protein Complexes FLORIAN FINK, 1 JOCHEN HOCHREIN, 1 VINCENT WOLOWSKI, 2 RAINER MERKL, 3 WOLFRAM GRONWALD 1 1 Institute of Functional Genomics, University of Regensburg, Regensburg, Germany 2 Faculty of Mathematics and Computer Science, University of Hagen, Hagen, Germany 3 Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany Received 9 November 2010; Revised 15 April 2011; Accepted 15 April 2011 DOI 10.1002/jcc.21837 Published online 31 May 2011 in Wiley Online Library (wileyonlinelibrary.com). Abstract: One of the main challenges in protein–protein docking is a meaningful evaluation of the many putative solutions. Here we present a program (PROCOS) that calculates a probability-like measure to be native for a given complex. In contrast to scores often used for analyzing complex structures, the calculated probabilities offer the advantage of providing a ﬁxed range of expected values. This will allow, in principle, the comparison of models corresponding to different targets that were solved with the same algorithm. Judgments are based on distributions of properties derived from a large database of native and false complexes. For complex analysis PROCOS uses these property distributions of native and false complexes together with a support vector machine (SVM). PROCOS was compared to the established scoring schemes of ZRANK and DFIRE. Employing a set of experimentally solved native complexes, high probability values above 50% were obtained for 90% of these structures. Next, the performance of PROCOS was tested on the 40 binary targets of the Dockground decoy set, on 14 targets of the RosettaDock decoy set and on 9 targets that participated in the CAPRI scoring evaluation. Again the advantage of using a probability-based scoring system becomes apparent and a reasonable number of near native complexes was found within the top ranked complexes. In conclusion, a novel fully automated method is presented that allows the reliable evaluation of protein–protein complexes. © 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2575–2586, 2011 Key words: protein–protein complex; docking; scoring; reranking; support vector machine Introduction Protein–Protein Interactions Proteins are an essential part of nearly all cellular processes. One important aspect of proteins is their three-dimensional structure, which must be known to understand their function in detail. Most frequently, protein structures are determined by means of X-ray crystallography and NMR spectroscopy, leading to a rapidly grow- ing number of solved structures. To date, more than 60,000 protein structures are deposited in the Protein Data Bank (PDB) available at (www.rcsb.org). 1 However, cellular functions are rarely carried out by single proteins but by complexes composed of several inter- acting proteins. It has been estimated that each protein has nine interaction partners on average. 2 However, due to experimental complexity only a very small part of the deposited structures con- sists of protein–protein complexes. High-throughput methods for detecting protein interactions, like yeast2hybrid assays or tandem- afﬁnity-puriﬁcation mass spectrometry, predict a large number of protein–protein interactions. These experimental approaches are supplemented by bioinformatic methods such as phylogenetic proﬁling, investigations of gene neighborhoods, and gene fusion analysis. Unfortunately, it is not possible to determine the struc- tures of all these protein complexes by experimental methods due to limitations concerning large or transient complexes. In addition, the experimental structure determination of protein–protein com- plexes is in most cases a time-consuming and challenging process. For that reason, computational approaches like docking algorithms that predict the structure of these complexes are needed. During the last few years, considerable effort has been put in the development and application of docking algorithms; for a review see. 3 The suc- cess of docking algorithms has consistently improved over the last years, as measured by the CAPRI blind docking experiment. 4, 5 Due to such efforts, on one hand the applicability of in silico created com- plexes is becoming more widely accepted, and on the other hand the various available docking algorithms can be objectively compared. Correspondence to: W. Gronwald; e-mail: wolfram.gronwald@klinik. uni-regensburg.de Contract/grant sponsor: Bavarian Genome Research Network (BAYGENE) © 2011 Wiley Periodicals, Inc.