An Analysis of the Helix-to-Strand Transition Between Peptides With Identical Sequence Xianghong Zhou, 1,2 Frank Alber, 3 Gerd Folkers, 2 Gaston H. Gonnet, 1 and Gareth Chelvanayagam 4 * 1 Department of Computer Science, Eidgeno ¨ ssische Technische Hochshule, Zu ¨ rich, Switzerland 2 Department of Applied Bioscience, Eidgeno ¨ ssische Technische Hochshule, Zu ¨ rich, Switzerland 3 International School for Advanced Studies (SISSA) and Istituto Nazionale di Fisica della Materia (INFM), Trieste, Italy 4 Department of Computer Science, University of Western Australia, Perth, Australia ABSTRACT An analysis of peptide segments with identical sequence but that differ signiﬁcantly in structure was performed over non-redundant databases of protein structures. We focus on those peptides, which fold into an -helix in one protein but a -strand in another. While the study shows that many such structurally ambivalent peptides contain amino acids with a strong helical prefer- ence collocated with amino acids with a strong strand preference, the results overwhelmingly indi- cate that the peptide’s environment ultimately dic- tates its structure. Furthermore, the ﬁrst naturally occurring structurally ambivalent nonapeptide from evolutionary unrelated proteins is described, high- lighting the intrinsic plasticity of peptide sequences. We even ﬁnd seven proteins that show structural ambivalence under different conditions. Finally, a computer algorithm has been implemented to iden- tify regions in a given sequence where secondary structure prediction programs are likely to make serious mispredictions. Proteins 2000;41:248 –256. © 2000 Wiley-Liss, Inc. Key words: structural ambivalence; protein second- ary structure; structure prediction; se- quence properties; sequence neigh- bours; long-range interaction; global environment INTRODUCTION -helices and -strands are the two most distinct ele- ments of the protein secondary structure. An -helix is formed mainly by local interactions while a -strand is usually formed by long-range interactions (i.e., residue i to residue (i+x), |x|4). Experimental results, as well as statistical analysis, show that different amino acids and their combinations have different propensities for -heli- cal or -strand formation. 1–10 These propensity scales provide important tools for secondary structure prediction, and in particular, methods that use local sequence informa- tion. 1,11,12 However, prediction methods based only on residue propensities are not foolproof 13 and various experi- mental studies have pointed out that the secondary struc- ture formation is strongly dependent on the environ- ment. 14 –17 For example, naturally occurring peptides were found to adopt an -helix conformation in organic solvent, but -strand in nonmicellar SDS. 15,18 Likewise, several theoretical studies showed that sequentially identical pep- tides in the Protein Data Bank (PDB 19 ) can adopt different secondary structures in different proteins. 20 –25 Even natu- rally occurring peptides as long as eight amino acids can be helical in one protein and a strand in another. 25 We term such peptides as structurally ambivalent. It is not known, however, just how far secondary structure formation is inﬂuenced by forces other than the sequence’s own intrin- sic propensity. Nor is it known if there is a minimum length for an autonomous folding unit based on the local interactions. Understanding the degree to which and the means by which the environment inﬂuences the structural ambivalence of peptides has implications for both protein design and the development of structure prediction meth- ods. This information is also important for elucidating the mechanisms by which proteins fold. Given the explosive growth of the PDB, the goal of this work is to assess the level of structural ambivalence among peptides with identical sequence in known struc- tures and to examine the origin of their structural diver- sity. COMPUTATIONAL METHODS Database Survey The Protein Data Bank (PDB) of June 1999 was used in this study. Secondary structure assignments were made automatically using the program package STRIDE. 26 One of our goals is the statistical analysis of peptide sequences with structural ambivalence. Thus, to avoid statistical bias caused by the large number of homolog proteins in the PDB, two protein sub-databases were used: one in which all protein-chain pairs have less than 25% sequence iden- tity (DB1) and one in which all protein-chain pairs have less than 95% sequence identity (DB2). DB1 (1106 chains) and DB2 (3295 chains) were taken from the May 1999 version of the “PDB_Select database.” 27 The selection of identical pairs of peptide sequences was performed as follows. First, we surveyed the complete PDB database selecting all possible sequence pairs with four identical residues (4-mer). Where possible, these were Grant sponsor: Australian Research Council. *Correspondence to: Dr. Gareth Chelvanayagam, Department of Computer Science, The University of Western Australia, Stirling Highway, Nedlands, Perth, W.A., 6009, Australia. E-mail: gareth@cs.uwa.edu.au Received 13 March 2000; Accepted 8 June 2000 PROTEINS: Structure, Function, and Genetics 41:248 –256 (2000) © 2000 WILEY-LISS, INC.