Progress and Challenges in High-Resolution Refinement of
Protein Structure Models
Kira M.S. Misura and David Baker
*
Department of Biochemistry, University of Washington Health Sciences, Seattle, Washington
ABSTRACT Achieving atomic level accuracy
in de novo structure prediction presents a formi-
dable challenge even in the context of protein mod-
els with correct topologies. High-resolution refine-
ment is a fundamental test of force field accuracy
and sampling methodology, and its limited success
in both comparative modeling and de novo predic-
tion contexts highlights the limitations of current
approaches. We constructed four tests to identify
bottlenecks in our current approach and to guide
progress in this challenging area. The first three
tests showed that idealized native structures are
stable under our refinement simulation conditions
and that the refinement protocol can significantly
decrease the root mean square deviation (RMSD) of
perturbed native structures. In the fourth test we
applied the refinement protocol to de novo models
and showed that accurate models could be identi-
fied based on their energies, and in several cases
many of the buried side chains adopted native-like
conformations. We also showed that the differences
in backbone and side-chain conformations between
the refined de novo models and the native structures
are largely localized to loop regions and regions
where the native structure has unusual features
such as rare rotamers or atypical hydrogen bonding
between -strands. The refined de novo models typi-
cally have higher energies than refined idealized
native structures, indicating that sampling of local
backbone conformations and side-chain packing
arrangements in a condensed state is a primary
obstacle. Proteins 2005;59:15–29.
© 2005 Wiley-Liss, Inc.
Key words: protein structure prediction; model re-
finement; Rosetta; free energy function
INTRODUCTION
Substantial progress has been made in the area of de
novo structure prediction; it is now possible to generate
models with correct topologies for small proteins using
several different methods, including the Rosetta de novo
algorithm.
1–4
In many cases, features of native proteins
such as turns, loops and relative orientations of secondary
structure elements are captured in the de novo models.
However, the overall accuracy of the models is not suffi-
cient for applications requiring high-resolution detail.
Even for small proteins of fewer than 100 amino acids, the
root mean squared deviation (RMSD) over alpha carbon
atoms of the native structure to the de novo models is
typically greater than 3 Å. In addition, most energy
functions cannot reliably distinguish models with the
correct topology from those with non-native topologies.
This is illustrated by the CASP4 and CASP5 experiments;
while one of the five models generated by Rosetta and
submitted as predictions often had the correct topology, it
was frequently not the best-ranked model.
To increase the accuracy and reliability of protein struc-
ture models, it is necessary to develop methods that
sample high-resolution details of native structures as well
as potential energy functions that recognize the native
state as the lowest energy conformation. One approach to
this problem is to refine low-resolution models produced by
de novo or template-based modeling methods. Successful
refinement would improve de novo or template-based
models by shifting their conformations closer to the native
state; equally importantly, they would allow models with
correct topologies and side-chain packing to be distin-
guished from non-native models based on their relative
energies. Energy based discrimination of models greater
with than 3 Å RMSD to the native structure is problematic
as the native side-chain packing arrangement is unlikely
to be captured.
5
High-resolution refinement would benefit
de novo structure prediction as well as comparative model-
ing applications, where it is desirable to generate models
that are more similar to the native structure than the
starting template.
High-resolution refinement is a difficult task that re-
quires an effective sampling strategy as well as an accu-
rate energy function to guide the search through conforma-
tional space. Attempts to refine protein structure models
into native-like conformations have been made previously.
Lee et al. used molecular dynamics simulations with an
explicit solvent model to refine Rosetta de novo models
followed by scoring with the Poisson–Boltzman surface
area solvation model.
6
Their results showed that native
structures could be distinguished from low-resolution mod-
els and that the native state is stable. Lu et al. used a
combination of local constraints, knowledge-based poten-
tials and molecular dynamics approaches.
7
While these
results were promising and showed improvements over
*Correspondence to: David Baker, Department of Biochemistry,
University of Washington, Box 357350, J-567 Health Sciences, Se-
attle, WA 98195-7350. Email: dabaker@u.washington.edu
Received 22 July 2004; Accepted 16 September 2004
Published online 2 February 2005 in Wiley InterScience
(www.interscience.wiley.com). DOI: 10.1002/prot.20376
PROTEINS: Structure, Function, and Bioinformatics 59:15–29 (2005)
© 2005 WILEY-LISS, INC.