NATURE BIOTECHNOLOGY VOLUME 25 NUMBER 3 MARCH 2007 297 Sequence-activity relationships guide directed evolution Joelle N Pelletier & Robert Lortie A new method for in vitro evolution integrates computational analysis and experimental screening. Directed evolution to improve the properties of proteins faces a problem of numbers: for an average-sized protein, the sequence space to be explored is astronomically large, whereas prac- tical screening capacity is limited to thousands or millions of variants. Virtual screening in silico can analyze a much higher fraction of the total number of variants and has considerable potential, but is not yet widely implemented. In this issue, Fox et al. 1 present an alternative—a combined computational and experimental method in which sequence space is sifted along multiple paths marked by experimental data points. Their approach, which incorporates a protein sequence-activity relationship (ProSAR) algorithm, promises to become a useful general strategy for evolving proteins with novel prop- erties. The authors apply ProSAR to address the challenging synthesis of the (3R,5S)-dihy- droxyheptanoate side chain of atorvastatin, the active ingredient in the blockbuster cholesterol- lowering drug Lipitor. Their starting point, a halohydrin dehalogenase from Agrobacterium radiobacter, catalyzes the cyanation of ethyl (S)- 4-chloro-3-hydroxybutyrate into the synthetic intermediate ethyl (R)-4-cyano-3-hydroxy- butyrate, but with an efficiency under process conditions that is several thousand–fold too low to be economically viable. The authors’ invest- ment in 18 rounds of directed evolution, each providing a modest improvement over the pre- vious one, paid off handsomely: they achieved Joelle N. Pelletier is at the Département de chimie and Département de biochimie, Université de Montréal, C.P. 6128, Succursale Centre-Ville, Montréal, Québec, Canada, H3C 3J7 and Robert Lortie is at the Biotechnology Research Institute, National Research Council, 6100 Royalmount Avenue, Montréal, Québec, Canada, H4P 2R2 and Département de chimie, Université de Montréal. e-mail: joelle.pelletier@umontreal.ca or robert.lortie@cnrc-nrc.gc.ca Mutated sequences Performance Partial least-squares regression New backbone Postulate Effect on performance 1 2 1 1 3 3 4 2 + + + + ? – – Include 1 Discard 2 Discard 4 Retest 3 New mutations 2 4 3 1 3 4 0 1,000 2,000 3,000 4,000 Relative productivity Target productivity Generations 0 5 10 15 a b Figure 1 Mathematical analysis bolsters directed evolution. (a) Partial least-squares regression is used to predict the effects of individual mutations on the performance of enzyme variants each of which carries multiple mutations. Deleterious mutations are discarded, and favorable mutations are included for the next round of screening. Whenever the effect of a given mutation is uncertain, it is retested in a subsequent round. (b) The rewards of a small increase in enzyme productivity (~1.5-fold) per round become evident in the later rounds. NEWS AND VIEWS © 2007 Nature Publishing Group http://www.nature.com/naturebiotechnology