Progress and Challenges in High-Resolution Refinement of Protein Structure Models Kira M.S. Misura and David Baker * Department of Biochemistry, University of Washington Health Sciences, Seattle, Washington ABSTRACT Achieving atomic level accuracy in de novo structure prediction presents a formi- dable challenge even in the context of protein mod- els with correct topologies. High-resolution refine- ment is a fundamental test of force field accuracy and sampling methodology, and its limited success in both comparative modeling and de novo predic- tion contexts highlights the limitations of current approaches. We constructed four tests to identify bottlenecks in our current approach and to guide progress in this challenging area. The first three tests showed that idealized native structures are stable under our refinement simulation conditions and that the refinement protocol can significantly decrease the root mean square deviation (RMSD) of perturbed native structures. In the fourth test we applied the refinement protocol to de novo models and showed that accurate models could be identi- fied based on their energies, and in several cases many of the buried side chains adopted native-like conformations. We also showed that the differences in backbone and side-chain conformations between the refined de novo models and the native structures are largely localized to loop regions and regions where the native structure has unusual features such as rare rotamers or atypical hydrogen bonding between -strands. The refined de novo models typi- cally have higher energies than refined idealized native structures, indicating that sampling of local backbone conformations and side-chain packing arrangements in a condensed state is a primary obstacle. Proteins 2005;59:15–29. © 2005 Wiley-Liss, Inc. Key words: protein structure prediction; model re- finement; Rosetta; free energy function INTRODUCTION Substantial progress has been made in the area of de novo structure prediction; it is now possible to generate models with correct topologies for small proteins using several different methods, including the Rosetta de novo algorithm. 1–4 In many cases, features of native proteins such as turns, loops and relative orientations of secondary structure elements are captured in the de novo models. However, the overall accuracy of the models is not suffi- cient for applications requiring high-resolution detail. Even for small proteins of fewer than 100 amino acids, the root mean squared deviation (RMSD) over alpha carbon atoms of the native structure to the de novo models is typically greater than 3 Å. In addition, most energy functions cannot reliably distinguish models with the correct topology from those with non-native topologies. This is illustrated by the CASP4 and CASP5 experiments; while one of the five models generated by Rosetta and submitted as predictions often had the correct topology, it was frequently not the best-ranked model. To increase the accuracy and reliability of protein struc- ture models, it is necessary to develop methods that sample high-resolution details of native structures as well as potential energy functions that recognize the native state as the lowest energy conformation. One approach to this problem is to refine low-resolution models produced by de novo or template-based modeling methods. Successful refinement would improve de novo or template-based models by shifting their conformations closer to the native state; equally importantly, they would allow models with correct topologies and side-chain packing to be distin- guished from non-native models based on their relative energies. Energy based discrimination of models greater with than 3 Å RMSD to the native structure is problematic as the native side-chain packing arrangement is unlikely to be captured. 5 High-resolution refinement would benefit de novo structure prediction as well as comparative model- ing applications, where it is desirable to generate models that are more similar to the native structure than the starting template. High-resolution refinement is a difficult task that re- quires an effective sampling strategy as well as an accu- rate energy function to guide the search through conforma- tional space. Attempts to refine protein structure models into native-like conformations have been made previously. Lee et al. used molecular dynamics simulations with an explicit solvent model to refine Rosetta de novo models followed by scoring with the Poisson–Boltzman surface area solvation model. 6 Their results showed that native structures could be distinguished from low-resolution mod- els and that the native state is stable. Lu et al. used a combination of local constraints, knowledge-based poten- tials and molecular dynamics approaches. 7 While these results were promising and showed improvements over *Correspondence to: David Baker, Department of Biochemistry, University of Washington, Box 357350, J-567 Health Sciences, Se- attle, WA 98195-7350. Email: dabaker@u.washington.edu Received 22 July 2004; Accepted 16 September 2004 Published online 2 February 2005 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20376 PROTEINS: Structure, Function, and Bioinformatics 59:15–29 (2005) © 2005 WILEY-LISS, INC.