Long Loop Prediction Using the Protein Local Optimization
Program
Kai Zhu, David L. Pincus, Suwen Zhao, and Richard A. Friesner
*
Department of Chemistry, Columbia University, New York, New York
ABSTRACT We have developed an improved
sampling algorithm and energy model for protein
loop prediction, the combination of which has
yielded the first methodology capable of achieving
good results for the prediction of loop backbone
conformations of 11 residue length or greater. Ap-
plied to our newly constructed test suite of 104 loops
ranging from 11 to 13 residues, our method obtains
average/median global backbone root-mean-square
deviations (RMSDs) to the native structure (superim-
posing the body of the protein, not the loop itself) of
1.00/0.62 Å for 11 residue loops, 1.15/0.60 Å for 12
residue loops, and 1.25/0.76 Å for 13 residue loops.
Sampling errors are virtually eliminated, while en-
ergy errors leading to large backbone RMSDs are
very infrequent compared to any previously re-
ported efforts, including our own previous study.
We attribute this success to both an improved sam-
pling algorithm and, more critically, the inclusion of
a hydrophobic term, which appears to approxi-
mately fix a major flaw in SGB solvation model that
we have been employing. A discussion of these
results in the context of the general question of the
accuracy of continuum solvation models is pre-
sented. Proteins 2006;65:438 – 452.
© 2006 Wiley-Liss, Inc.
Key words: loop prediction; conformational sam-
pling; continuum solvation model; hy-
drophobic
INTRODUCTION
Loop prediction has become a canonical problem in
assessing methods for high-resolution protein structural
modeling. Well-defined test cases can be constructed by
starting with a high-resolution structure from the Protein
Data Bank (PDB), defining a loop region, and predicting
the structure in that region while keep the remaining
residues of the protein fixed at their crystallographic
coordinates. Realistic applications, such as enumeration of
alternative low-energy conformations of the loop (as, for
example, are frequently seen in flexible active sites such as
kinases), or construction of accurate loop conformations in
homology modeling, require reprediction of surrounding
side chains (and possibly other degrees of freedom) as well
as the loop itself. Thus, success in repredicting native loops
in the fixed, crystallographically determined environment,
is necessary, but not sufficient, to enable useful practical
deployment of the methodology.
In previous work,
1
we have introduced a new approach
to loop prediction, in which rigorous hierarchical sampling
algorithms are combined with a high-quality molecular
mechanics force field and continuum solvation model.
These methods have been implemented in the Protein
Local Optimization Program (PLOP) and were tested on a
suite of 800 loops ranging in length from 4 to 12 residues.
Qualitatively improved accuracy was obtained compared
to previous efforts at loop prediction, which principally
have employed approximate, knowledge-based potential
energy functions, as opposed to a model based on an atomic
level description of the physical chemistry.
Although the results in Ref. 1 were encouraging, the
performance of the method clearly deteriorated beyond a
loop length of 9 residues. Both sampling errors (i.e.,
cases where the total energy of the predicted structure was
significantly higher than that of the minimized, or side-
chain optimized native structure) and energy errors (cases
where the total energy of the predicted structure was
significantly lower than that of the native structure)
increased in frequency compared to shorter loops, and the
RMSDs from the native loop of both the sampling and
energy errors increased in magnitude. Furthermore, the
test suites used for assessing performance on longer loops
were inadequate in size.
The problems observed in Ref. 1 for long loop prediction
are far from unique to that article. Table I
2–5
presents
results taken from work by various groups in predicting
loops of length 11 or greater. All these approaches use
dihedral angle buildup and candidate selection by a scor-
ing or energy function, but they differ in the algorithm
details and energy function compositions. A recent article
6
by Monnigmann et al. provides an overview and brief
descriptions for the various alternative methods. It should
be noted that the results in Table I are not generated on
the same test set; however, they do show some common
trend. There is a transition of some sort between 10 and 12
residues, which renders the loop prediction problem quali-
The Supplementary Material referred to in this article can be found
at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/
The first two authors contributed equally to this work.
Grant sponsor: NIH; Grant number: GM 52018 (to R.A.F.).
*Correspondence to: Richard A. Friesner, Department of Chemistry,
Columbia University, New York, NY 10027. E-mail:
rich@chem.columbia.edu
Received 3 October 2005; Revised 27 January 2006; Accepted 12
March 2006
Published online 22 August 2006 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.21040
PROTEINS: Structure, Function, and Bioinformatics 65:438 – 452 (2006)
© 2006 WILEY-LISS, INC.