Article Protein energetic conformational analysis from NMR chemical shifts (PECAN) and its use in determining secondary structural elements Hamid R. Eghbalnia a,b, *, Liya Wang a,c,d , Arash Bahrami a,c,d , Amir Assadi b & John L. Markley a,c,d a Biochemistry Department, National Magnetic Resonance Facility at Madison, 433 Babcock Drive, Madison, WI, 53706, USA; b Mathematics Department, University of Wisconsin–Madison, 811 Van Vleck Hall, 480 Lincoln Drive, Madison, WI, 53706, USA; c Center for Eukaryotic Structural Genomics, University of Wis- consin–Madison, Madison, WI, 53706, USA; d Graduate Program in Biophysics, University of Wisconsin– Madison, Madison, WI, 53706, USA Received 28 December 2004; Accepted 08 March 2005 Key words: chemical shifts, protein secondary structure, statistical energy model, statistical decision Abstract We present an energy model that combines information from the amino acid sequence of a protein and available NMR chemical shifts for the purposes of identifying low energy conformations and determining elements of secondary structure. The model (‘‘PECAN’’, Protein Energetic Conformational Analysis from NMR chemical shifts) optimizes a combination of sequence information and residue-specific statistical energy function to yield energetic descriptions most favorable to predicting secondary structure. Compared to prior methods for secondary structure determination, PECAN provides increased accuracy and range, particularly in regions of extended structure. Moreover, PECAN uses the energetics to identify residues located at the boundaries between regions of predicted secondary structure that may not fit the stringent secondary structure class definitions. The energy model offers insights into the local energetic patterns that underlie conformational preferences. For example, it shows that the information content for defining secondary structure is localized about a residue and reaches a maximum when two residues on either side are considered. The current release of the PECAN software determines the well-defined regions of sec- ondary structure in novel proteins with assigned chemical shifts with an overall accuracy of 90%, which is close to the practical limit of achievable accuracy in classifying the states. Introduction Protein secondary structure plays an important role in classifying proteins (Lesk and Rose, 1981) and in analyzing their functional properties (Przytycka et al., 1999). A host of methods have been devel- oped for the prediction of secondary structure from atomic coordinates (determined from X-ray crys- tallography or NMR spectroscopy), NMR chemi- cal shifts, or simply peptide sequences. The primary forces that govern secondary and tertiary structure are closely related, and it is generally assumed that a detailed characterization of the energetic genesis of secondary structure is a key step toward under- standing protein folding. The accuracy of secondary structure prediction from amino acid sequence alone has been reported to be as high as 78% on selected datasets (Albrecht et al., 2003). The fact that secondary structure can be predicted from sequence with some measure of success indicates that amino acid sequences encode *To whom correspondence should be addressed. E-mail: eghbalni@nmrfam.wisc.edu. Journal of Biomolecular NMR (2005) 32: 71–81 Ó Springer 2005 DOI 10.1007/s10858-005-5705-1