Amino Acid Propensities are Position-dependent Throughout the Length of a-Helices Donald E. Engel 1 and William F. DeGrado 2 * 1 Department of Physics University of Pennsylvania Philadelphia PA 19104, USA 2 Department of Biochemistry and Molecular Biophysics School of Medicine, University of Pennsylvania, Stella-Chance bldg Room 1010, 421 Curie Blvd, Philadelphia PA 19104-6059, USA The 20 commonly occurring amino acids have been shown to have dis- tinct position-dependent, helix-forming propensities near the ends of a-helices. Here, we show that the amino acids also have very strong position-dependent propensities throughout the length of a helix. Most helices are amphiphilic, and they have a strong tendency to both begin and end on the solvent-inaccessible face of the helix. These position- specific propensities should provide valuable parameters to guide de novo protein design, and should allow more precise prediction of helical topology in natural proteins. q 2004 Elsevier Ltd. All rights reserved. Keywords: de novo design; helix capping; fractional solvent accessibility; hydrophobicity; position-dependent propensity *Corresponding author Introduction An understanding of the relationship between amino acid sequence and helix formation is essen- tial for both prediction of tertiary structures and design of structures unprecedented in nature. Different amino acids have distinct propensities for the adoption of helical, strand, and random coil conformations. 1 These propensities have formed the bases for many secondary structure prediction schemes. 2 However, precise prediction of the beginning and endpoints of helices has been somewhat problematic, although position-specific propensities have assisted somewhat in this process. 3–7 By convention, N1 is the first helical residue from the N terminus of a helix, N2 is the second, and so forth, with the N-cap position being the position preceding N1. 8 This labeling scheme does not require a hydrogen-bonded cap- ping interaction at the N-cap position. Early studies show very strong differences in the propen- sities of amino acids to reside at the N-cap, N1, N2, and N3 positions. 9 – 11 However, the sequence dependence of helical propensities has not been investigated at positions between the ends of the helices. Moreover, it was widely assumed that beyond the first few residues the average environ- ment would become uniform, leading to essentially isotropic distributions. 12 Here, we provide the unexpected finding that sequence-dependent propensities are strong, propagating at least 15 residues from the N terminus of a helix. Further, we demonstrate that this effect has its origin in a preferred orientation of the helix relative to its N-cap position. Results and Discussion For this study we used the Protein Data Bank (PDB) Select April 2002 list of non-redundant pro- tein chains (25% threshold version). 13 Defining helices as described by Gunasekaran, 14 this list con- tains 1739 chains from 1670 proteins, with a total of 8227 helices of at least five residues. Figure 1 illus- trates the frequency of observation of helices of different lengths in the overall population. In agreement with earlier workers, we observe a peak near N ¼ 10: 15 Analogous to Doig’s methods, 16 the distribution is fit to a smooth equation. We found that a vertically shifted Gaussian provided a somewhat better fit than Doig’s quartic expression. The difference between observed lengths and the Gaussian fit is expressed in the inset plot of weighted residuals ðwr i Þ, defined in terms of the observed frequency f i ðobsÞ 0022-2836/$ - see front matter q 2004 Elsevier Ltd. All rights reserved. Supplementary data associated with this article can be found at doi: 10.1016/j.jmb.2004.02.004 E-mail address of the corresponding author: wdegrado@mail.med.upenn.edu doi:10.1016/j.jmb.2004.02.004 J. Mol. Biol. (2004) 337, 1195–1205