Structural Insights into the Evolution of a Non-Biological Protein: Importance of Surface Residues in Protein Fold Optimization Matthew D. Smith 1,2 , Matthew A. Rosenow 1,2 , Meitian Wang 2 , James P. Allen 2 , Jack W. Szostak 3 , John C. Chaput 1,2 * 1 Center for BioOptical Nanotechnology, The Biodesign Institute, Arizona State University, Tempe, Arizona, United States of America, 2 Department of Chemistry and Biochemistry, Arizona State University, Tempe, Arizona, United States of America, 3 Howard Hughes Medical Institute, Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, United States of America Phylogenetic profiling of amino acid substitution patterns in proteins has led many to conclude that most structural information is carried by interior core residues that are solvent inaccessible. This conclusion is based on the observation that buried residues generally tolerate only conserved sequence changes, while surface residues allow more diverse chemical substitutions. This notion is now changing as it has become apparent that both core and surface residues play important roles in protein folding and stability. Unfortunately, the ability to identify specific mutations that will lead to enhanced stability remains a challenging problem. Here we discuss two mutations that emerged from an in vitro selection experiment designed to improve the folding stability of a non-biological ATP binding protein. These mutations alter two solvent accessible residues, and dramatically enhance the expression, solubility, thermal stability, and ligand binding affinity of the protein. The significance of both mutations was investigated individually and together, and the X-ray crystal structures of the parent sequence and double mutant protein were solved to a resolution limit of 2.8 and 1.65 A ˚ , respectively. Comparative structural analysis of the evolved protein to proteins found in nature reveals that our non-biological protein evolved certain structural features shared by many thermophilic proteins. This experimental result suggests that protein fold optimization by in vitro selection offers a viable approach to generating stable variants of many naturally occurring proteins whose structures and functions are otherwise difficult to study. Citation: Smith MD, Rosenow MA, Wang M, Allen JP, Szostak JW, et al (2007) Structural Insights into the Evolution of a Non-Biological Protein: Importance of Surface Residues in Protein Fold Optimization. PLoS ONE 2(5): e467. doi:10.1371/journal.pone.0000467 INTRODUCTION We are interested in the extent to which nature samples the total structural diversity available in protein sequence space [1,2]. In pursuit of finding novel proteins with properties similar to natural proteins, we have discovered that functional proteins can be selected from large unconstrained libraries of random amino acid sequences [1]. The non-biological proteins that emerge from these selections are discovered in much the same way that aptamers are selected from large pools of DNA and RNA [3]. We call this approach de novo protein evolution since the in vitro process of selection and amplification closely mimics the natural process of Darwinian evolution. In these experiments we explore how functional proteins evolve by imposing a selective pressure on a diverse population of unrelated sequences to enrich for molecules with a desired functional property. By starting from a random pool of proteins we attempt to sample broad regions of protein sequence space for different independent solutions to a given functional problem. Following several rounds of selection, we then recover the descendents of rare functional proteins that originated from starting libraries as large as 10 13 different sequences. It is important to note that while our starting libraries are large by conventional standards, the total number of sequences analyzed represents an extremely sparse sampling of all possible sequence combinations, indicating that protein sequence space is surprisingly rich in functional diversity. In contrast to de novo protein design, which often relies on genetic algorithms to predict sequences consistent with a predeter- mined secondary or tertiary structure [4,5], the process of de novo protein evolution requires no prior knowledge of a protein’s structure or mechanism in order for a selection to be successful. As a result, larger regions of the protein universe can be explored for protein structures that are unanticipated and therefore potentially much more novel than structures predicted by design. Initial experiments in de novo protein evolution began with an attempt to ascertain the frequency of ATP binding proteins in a sampling of all possible sequences in a contiguous library of 80 amino acids [1]. Here functional activity was viewed as the ability to bind a desired small molecule target with high affinity and specificity. Given that many protein domains have sequence lengths between 50 and 100 amino acids and ATP binding proteins are present in every major enzyme class, we felt that the random region and target choice were appropriate to identify small protein domains with simple ligand binding activity [6]. Because the probability of finding functional domains in a stochastic sampling of protein sequences was anticipated to be very low, perhaps too low to detect with conventional technolo- gies, we used a cell-free selection system called messenger RNA display to construct a starting library of greater than 10 12 non- redundant random-sequence proteins [7]. Academic Editor: Haiwei Song, Institute of Molecular and Cell Biology, Singapore Received March 28, 2007; Accepted April 25, 2007; Published May 23, 2007 Copyright: ß 2007 Smith et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by new laboratory start-up funds from the The Biodesign Institute to JCC and grants from the NIH and NASA Astrobiology Institute to JWS. JWS is an Investigator, and J.C.C. was a Research Associate of the Howard Hughes Medical Institute. Competing Interests: The authors have declared that no competing interests exist. * To whom correspondence should be addressed. E-mail: john.chaput@asu.edu PLoS ONE | www.plosone.org 1 May 2007 | Issue 5 | e467