An Analysis of the Helix-to-Strand Transition Between
Peptides With Identical Sequence
Xianghong Zhou,
1,2
Frank Alber,
3
Gerd Folkers,
2
Gaston H. Gonnet,
1
and Gareth Chelvanayagam
4
*
1
Department of Computer Science, Eidgeno ¨ ssische Technische Hochshule, Zu ¨ rich, Switzerland
2
Department of Applied Bioscience, Eidgeno ¨ ssische Technische Hochshule, Zu ¨ rich, Switzerland
3
International School for Advanced Studies (SISSA) and Istituto Nazionale di Fisica della Materia (INFM), Trieste, Italy
4
Department of Computer Science, University of Western Australia, Perth, Australia
ABSTRACT An analysis of peptide segments
with identical sequence but that differ significantly
in structure was performed over non-redundant
databases of protein structures. We focus on those
peptides, which fold into an -helix in one protein
but a -strand in another. While the study shows
that many such structurally ambivalent peptides
contain amino acids with a strong helical prefer-
ence collocated with amino acids with a strong
strand preference, the results overwhelmingly indi-
cate that the peptide’s environment ultimately dic-
tates its structure. Furthermore, the first naturally
occurring structurally ambivalent nonapeptide from
evolutionary unrelated proteins is described, high-
lighting the intrinsic plasticity of peptide sequences.
We even find seven proteins that show structural
ambivalence under different conditions. Finally, a
computer algorithm has been implemented to iden-
tify regions in a given sequence where secondary
structure prediction programs are likely to make
serious mispredictions. Proteins 2000;41:248 –256.
© 2000 Wiley-Liss, Inc.
Key words: structural ambivalence; protein second-
ary structure; structure prediction; se-
quence properties; sequence neigh-
bours; long-range interaction; global
environment
INTRODUCTION
-helices and -strands are the two most distinct ele-
ments of the protein secondary structure. An -helix is
formed mainly by local interactions while a -strand is
usually formed by long-range interactions (i.e., residue i to
residue (i+x), |x|4). Experimental results, as well as
statistical analysis, show that different amino acids and
their combinations have different propensities for -heli-
cal or -strand formation.
1–10
These propensity scales
provide important tools for secondary structure prediction,
and in particular, methods that use local sequence informa-
tion.
1,11,12
However, prediction methods based only on
residue propensities are not foolproof
13
and various experi-
mental studies have pointed out that the secondary struc-
ture formation is strongly dependent on the environ-
ment.
14 –17
For example, naturally occurring peptides were
found to adopt an -helix conformation in organic solvent,
but -strand in nonmicellar SDS.
15,18
Likewise, several
theoretical studies showed that sequentially identical pep-
tides in the Protein Data Bank (PDB
19
) can adopt different
secondary structures in different proteins.
20 –25
Even natu-
rally occurring peptides as long as eight amino acids can be
helical in one protein and a strand in another.
25
We term
such peptides as structurally ambivalent. It is not known,
however, just how far secondary structure formation is
influenced by forces other than the sequence’s own intrin-
sic propensity. Nor is it known if there is a minimum
length for an autonomous folding unit based on the local
interactions. Understanding the degree to which and the
means by which the environment influences the structural
ambivalence of peptides has implications for both protein
design and the development of structure prediction meth-
ods. This information is also important for elucidating the
mechanisms by which proteins fold.
Given the explosive growth of the PDB, the goal of this
work is to assess the level of structural ambivalence
among peptides with identical sequence in known struc-
tures and to examine the origin of their structural diver-
sity.
COMPUTATIONAL METHODS
Database Survey
The Protein Data Bank (PDB) of June 1999 was used in
this study. Secondary structure assignments were made
automatically using the program package STRIDE.
26
One
of our goals is the statistical analysis of peptide sequences
with structural ambivalence. Thus, to avoid statistical
bias caused by the large number of homolog proteins in the
PDB, two protein sub-databases were used: one in which
all protein-chain pairs have less than 25% sequence iden-
tity (DB1) and one in which all protein-chain pairs have
less than 95% sequence identity (DB2). DB1 (1106 chains)
and DB2 (3295 chains) were taken from the May 1999
version of the “PDB_Select database.”
27
The selection of identical pairs of peptide sequences was
performed as follows. First, we surveyed the complete PDB
database selecting all possible sequence pairs with four
identical residues (4-mer). Where possible, these were
Grant sponsor: Australian Research Council.
*Correspondence to: Dr. Gareth Chelvanayagam, Department of
Computer Science, The University of Western Australia, Stirling
Highway, Nedlands, Perth, W.A., 6009, Australia. E-mail:
gareth@cs.uwa.edu.au
Received 13 March 2000; Accepted 8 June 2000
PROTEINS: Structure, Function, and Genetics 41:248 –256 (2000)
© 2000 WILEY-LISS, INC.