Progress in Predicting Inter-Residue Contacts of Proteins
With Neural Networks and Correlated Mutations
Piero Fariselli,
1
Osvaldo Olmea,
2
Alfonso Valencia,
2
and Rita Casadio
1
*
1
CIRB and Department of Biology, University of Bologna, Bologna, Italy
2
Protein Design Group, CNB-CSIC Cantoblanco, Madrid, Spain
ABSTRACT This article presents recent
progress in predicting inter-residue contacts of pro-
teins with a neural network-based method. Improve-
ment over the results obtained at the previous
CASP3 competition is attained by using as input to
the network a complex code, which includes evolu-
tionary information, sequence conservation, corre-
lated mutations, and predicted secondary struc-
tures. The predictor was trained and cross-validated
on a data set comprising the contact maps of 173
non-homologous proteins as computed from their
well-resolved three-dimensional structures. The
method could assign protein contacts with an aver-
age accuracy of 0.21 and with an improvement over
a random predictor of a factor greater than 6, which
is higher than that previously obtained with meth-
ods only based either on neural networks or on
correlated mutations. Although far from being ideal,
these scores are the highest reported so far for
predicting protein contact maps. On 29 targets auto-
matically predicted by the server (CORNET) the
average accuracy is 0.14. The predictor is poorly
performing on all- proteins, not represented in the
training set. On all- and mixed proteins (22 targets)
the average accuracy is 0.16. This set comprises
proteins of different complexity and different chain
length, suggesting that the predictor is capable of
generalization over a broad number of features.
Proteins 2001;Suppl 5:157–162. © 2002 Wiley-Liss, Inc.
Key words: protein structure predictions; contact
maps; correlated mutations; neural net-
works; residue contacts
INTRODUCTION
A useful two-dimensional representation of a protein
three-dimensional (3D) structure is its contact map.
1
Secondary structures are easily detected from the contact
map. -Helices appear as thick bands along the main
diagonal involving contacts between residues in position i
and i+4, respectively. Offset parallel or perpendicular
bands to the main diagonal are distinguished marks of
parallel or antiparallel -sheets. The remaining contacts
in the representation are sparse and/or clustering in
segregated areas, depending on the protein structural
complexity.
In real proteins, the number of contacts linearly scales
with the chains length.
2–4
The slope of the linear depen-
dence depends on the contact definition.
3
Various ways
have been used to define contacts. Routinely, a contact is
said to exist between each pair of residues whenever the
mutual distance is below a given arbitrary threshold. The
distance involved in the different definitions of a contact
can be that between the C
-C
atoms,
3
between the
C
-C
,
2,5,6
and the minimal distance between atoms belong-
ing to the side chain or to the backbone of the two
residues.
4
If the true physical contact map representation of a
protein is known, it is possible to recover its 3D structure.
The similarity to the native structure is still rather good
[low root-mean-square deviation (RMSD) to the crystal]
even when the number of true contacts is reduced by a
factor of two.
3
A relevant issue is, therefore, whether it is possible to
predict the contact map of a protein starting from the
residue sequence and, most importantly, to which extent
the prediction can be useful to reconstruct the protein
structure. In this article we focus on the accuracy of the
prediction of contact maps that can be obtained with our
predictor (CORNET) and highlight some future perspec-
tives for this ab initio procedure.
MATERIALS AND METHODS
We developed CORNET, a predictor that is essentially
based on neural networks. The system was trained to learn
the association rules between the covalent structure of
each protein belonging to a selected database and its
contact map. Complexity of the input coding, which is
rather complex compared with others previously used for
the same task, is new in the present version. CORNET was
specifically designed to include evolutionary information
in the form of sequence profile, sequence conservation,
correlated mutations, and predicted secondary structures.
We were prompted to modify the input coding by the
results obtained at CASP3.
7
A brief description of the
method is outlined below.
Grant sponsor: Ministero della Universita ´ e della Ricerca Scientifica
e Tecnologica; Grant sponsor: Italian Centro Nazionale delle Ricerche
(Target: Biotechnology).
O. Olmea’s present address is Department of Physiology and
Biophysics, Mount Sinai School of Medicine, New York, NY.
*Correspondence to: Rita Casadio, Department of Biology, Via
Irnerio 42, I-40126, Bologna, Italy. E-mail: casadio@alma.unibo.it
Received 27 March 2001; Accepted 2 July 2001
Published online 28 January 2002
PROTEINS: Structure, Function, and Genetics Suppl 5:157–162 (2001)
DOI 10.1002/prot.1173
© 2002 WILEY-LISS, INC.