PROCOS: Computational Analysis of Protein–Protein
Complexes
FLORIAN FINK,
1
JOCHEN HOCHREIN,
1
VINCENT WOLOWSKI,
2
RAINER MERKL,
3
WOLFRAM GRONWALD
1
1
Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
2
Faculty of Mathematics and Computer Science, University of Hagen, Hagen, Germany
3
Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
Received 9 November 2010; Revised 15 April 2011; Accepted 15 April 2011
DOI 10.1002/jcc.21837
Published online 31 May 2011 in Wiley Online Library (wileyonlinelibrary.com).
Abstract: One of the main challenges in protein–protein docking is a meaningful evaluation of the many putative
solutions. Here we present a program (PROCOS) that calculates a probability-like measure to be native for a given
complex. In contrast to scores often used for analyzing complex structures, the calculated probabilities offer the advantage
of providing a fixed range of expected values. This will allow, in principle, the comparison of models corresponding to
different targets that were solved with the same algorithm. Judgments are based on distributions of properties derived
from a large database of native and false complexes. For complex analysis PROCOS uses these property distributions of
native and false complexes together with a support vector machine (SVM). PROCOS was compared to the established
scoring schemes of ZRANK and DFIRE. Employing a set of experimentally solved native complexes, high probability
values above 50% were obtained for 90% of these structures. Next, the performance of PROCOS was tested on the 40
binary targets of the Dockground decoy set, on 14 targets of the RosettaDock decoy set and on 9 targets that participated
in the CAPRI scoring evaluation. Again the advantage of using a probability-based scoring system becomes apparent and
a reasonable number of near native complexes was found within the top ranked complexes. In conclusion, a novel fully
automated method is presented that allows the reliable evaluation of protein–protein complexes.
© 2011 Wiley Periodicals, Inc. J Comput Chem 32: 2575–2586, 2011
Key words: protein–protein complex; docking; scoring; reranking; support vector machine
Introduction
Protein–Protein Interactions
Proteins are an essential part of nearly all cellular processes. One
important aspect of proteins is their three-dimensional structure,
which must be known to understand their function in detail. Most
frequently, protein structures are determined by means of X-ray
crystallography and NMR spectroscopy, leading to a rapidly grow-
ing number of solved structures. To date, more than 60,000 protein
structures are deposited in the Protein Data Bank (PDB) available
at (www.rcsb.org).
1
However, cellular functions are rarely carried
out by single proteins but by complexes composed of several inter-
acting proteins. It has been estimated that each protein has nine
interaction partners on average.
2
However, due to experimental
complexity only a very small part of the deposited structures con-
sists of protein–protein complexes. High-throughput methods for
detecting protein interactions, like yeast2hybrid assays or tandem-
affinity-purification mass spectrometry, predict a large number
of protein–protein interactions. These experimental approaches
are supplemented by bioinformatic methods such as phylogenetic
profiling, investigations of gene neighborhoods, and gene fusion
analysis. Unfortunately, it is not possible to determine the struc-
tures of all these protein complexes by experimental methods due
to limitations concerning large or transient complexes. In addition,
the experimental structure determination of protein–protein com-
plexes is in most cases a time-consuming and challenging process.
For that reason, computational approaches like docking algorithms
that predict the structure of these complexes are needed. During the
last few years, considerable effort has been put in the development
and application of docking algorithms; for a review see.
3
The suc-
cess of docking algorithms has consistently improved over the last
years, as measured by the CAPRI blind docking experiment.
4, 5
Due
to such efforts, on one hand the applicability of in silico created com-
plexes is becoming more widely accepted, and on the other hand the
various available docking algorithms can be objectively compared.
Correspondence to: W. Gronwald; e-mail: wolfram.gronwald@klinik.
uni-regensburg.de
Contract/grant sponsor: Bavarian Genome Research Network (BAYGENE)
© 2011 Wiley Periodicals, Inc.