Improved PEP-FOLD Approach for Peptide and Miniprotein Structure
Prediction
Yimin Shen,
†,§,#
Julien Maupetit,
‡,§,#
Philippe Derreumaux,
‡,¶,§,∥
and Pierre Tuffe ́ ry*
,†,§,#
†
INSERM U973, MTi, F-75205 Paris, France
‡
Laboratoire de Biochimie The ́ orique, UPR 9080 CNRS, Institut de Biologie Physico-Chimique, F-75005 Paris, France
¶
Institut Universitaire de France, 103 Boulevard Saint-Michel, 75005, Paris, France
§
Univ Paris Diderot, Sorbonne Paris Cite ́ , F-75205 Paris, France
* S Supporting Information
ABSTRACT: Peptides and mini proteins have many bio-
logical and biomedical implications, which motivates the
development of accurate methods, suitable for large-scale
experiments, to predict their experimental or native con-
formations solely from sequences. In this study, we report
PEP-FOLD2, an improved coarse grained approach for
peptide de novo structure prediction and compare it with
PEP-FOLD1 and the state-of-the-art Rosetta program. Using a
benchmark of 56 structurally diverse peptides with 25−52
amino acids and a total of 600 simulations for each system, PEP-FOLD2 generates higher quality models than PEP-FOLD1, and
PEP-FOLD2 and Rosetta generate near-native or native models for 95% and 88% of the targets, respectively. In the situation
where we do not have any experimental structures at hand, PEP-FOLD2 and Rosetta return a near-native or native conformation
among the top five best scored models for 80% and 75% of the targets, respectively. While the PEP-FOLD2 prediction rate is
better than the ROSETTA prediction rate by 5%, this improvement is non-negligible because PEP-FOLD2 explores a larger
conformational space than ROSETTA and consists of a single coarse-grained phase. Our results indicate that if the coarse-grained
PEP-FOLD2 method is approaching maturity, we are not at the end of the game of mini-protein structure prediction, but this
opens new perspectives for large-scale in silico experiments.
■
INTRODUCTION
Fast and accurate peptide structure characterization remains a
long-standing goal in structural biology and peptide engineering
since peptides up to 50 amino acids represent a source of novel
antibiotics and therapeutics.
1
In addition, these amino acid sizes
can fold autonomously and be the functional centers of full
length proteins (e.g., C1, UBA, and WW, to cite some).
2−4
One
major obstacle in predicting peptide structures, in contrast to
larger proteins, is that only a small number of solution
structures have been characterized and are available in
structural databases. On October 1st, 2013, the number of
entries of the Protein Data Bank (PDB)
5
corresponding to
isolated proteins of less than 51 amino acids was 2057, and only
799 proteins had less than 30% sequence identity and their
structures not solved in a membrane environment. In addition,
de novo sequences can deviate from those in the PDB by more
than 70% sequence identity, making the use of comparative
modeling techniques unreliable when no experimental
information is available. For instance, it is remarkable that the
de novo peptide with the helix-turn-helix motif designed in
2004 (PDB 1vrz) or with the beta-alpha-beta motif (PDB 2ki0)
designed in 2009 still do not have any homologue in the PDB.
Considering the number of new sequences that are delivered
by each genome project, we need to go beyond time-
consuming simulations of all-atom systems in explicit solvent,
though molecular dynamics studies show success in folding
diverse structurally proteins with 10−80 amino acids by using
the specially designed Anton computer
6
or the Folding-at-home
project.
7
Present estimates of the number of hypothetical
peptide coding sequences in the complete prokaryotic genomes
available today are on the order of 1.5 million.
57
In eucaryotes,
the number of peptide candidates is even higher, with estimates
of the number of venom peptides on the order of 12 millions.
8
This highlights the need for fast approaches to model the
structure of peptide and small proteins.
The most efficient and rapid methods are multiscale in
character. Such methods start sampling with low resolution
models, use fragment assembly (FA) methods and then select
some conformations for subsequent full-atom refinements.
These include the widely used Rosetta,
9,10
I-Tasser,
11
and
Quark
12
methods. Other Web servers include PepStr,
13
Bhageerath,
14
and Peplook.
15
Other programs such as Zipping
and Assembly,
16
the AWSEM-based approach,
17
the conforma-
tional space annealing,
18
GPS,
19
and replica exchange molecular
dynamics simulations (REMD) with OPEP
20
are not open and
Received: April 5, 2014
Published: August 20, 2014
Article
pubs.acs.org/JCTC
© 2014 American Chemical Society 4745 dx.doi.org/10.1021/ct500592m | J. Chem. Theory Comput. 2014, 10, 4745−4758