Solvent Accessible Surface Area-Based Hot-Spot Detection Methods
for Protein-Protein and Protein-Nucleic Acid Interfaces
Cristian R. Munteanu,
†
Antó nio C. Pimenta,
‡
Carlos Fernandez-Lozano,
†
Andre ́ Melo,
‡
Maria N. D. S. Cordeiro,
‡
and Irina S. Moreira*
,‡,§
†
Information and Communication Technologies Department, Computer Science Faculty, University of A Coruna, Campus de Elviñ a
s/n, 15071 A Coruñ a, Spain
‡
REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciê ncias da Universidade do Porto, Rua do Campo Alegre s/n,
4169-007 Porto, Portugal
§
CNCCenter for Neuroscience and Cell Biology, Universidade de Coimbra, Rua Larga, FMUC, Polo I, 1°andar, 3004-517
Coimbra, Portugal
* S Supporting Information
ABSTRACT: Due to the importance of hot-spots (HS) detection and the
efficiency of computational methodologies, several HS detecting
approaches have been developed. The current paper presents new models
to predict HS for protein-protein and protein-nucleic acid interactions
with better statistics compared with the ones currently reported in
literature. These models are based on solvent accessible surface area
(SASA) and genetic conservation features subjected to simple Bayes
networks (protein-protein systems) and a more complex multi-objective
genetic algorithm-support vector machine algorithms (protein-nucleic
acid systems). The best models for these interactions have been
implemented in two free Web tools.
■
INTRODUCTION
Proteins are essential macromolecules in several biochemical
processes due to their ability to perform multiple tasks such as
enzymatic catalysis, transport, and signal transduction among
others.
1,2
To perform these tasks, proteins have to form
complexes with other biomolecules, which is a fundamental
step for several biochemical processes. It is therefore crucial to
attain a complete understanding of the structural atomistic
details of these interactions to better develop methods to
influence the binding. Although protein-based interfaces usually
comprehend a high number of residues, it has been proved that
the majority of the binding energy can be accounted for by the
interaction of a small number of residues known as hot-spots
(HS).
3-5
In order to investigate the contribution of a residue to
the binding energy, the residue of interest is mutated to an
alanine, and the binding free energy difference (ΔΔG
binding
) is
calculated. The definitions of HS vary among authors, but it is
most commonly accepted that HS are defined as residues with
ΔΔG
binding
≥ 2.0 kcal mol
-1
; the ones with ΔΔG
binding
< 2.0
kcal mol
-1
are called null-spots (NS).
5
HS can be found experimentally, using molecular biology and
thermodynamic methods upon alanine scanning mutagenesis
(ASM), but these are not only expensive but often complex and
time-consuming. Due to these difficulties, computational
approaches with a higher relation efficiency/cost and lower
experimental time have been developed. These can be generally
described as empirical functions or knowledge-based models, all
atom methods and feature-based approaches.
6-10
Although
fully atomistic models were shown to accurately predict HS and
be able to fully characterize this type of interactions, the time
and complexity involved is often very high.
9,11
In our previous
work we investigated feature-based methods that combine
solvent accessible surface area (SASA) descriptors calculated
from static structures and molecular dynamics (MD)
ensembles, which were analyzed by a support vector machine
(SVM) algorithm.
6
We presented a new HS predictive model:
SASA-based hot-spot detection (SBHD). However, at the time
our method was only applied to a small number of complexes,
and it has been demonstrated to have a high number of false
positives (incorrectly detect NS as HS). To improve this aspect
and achieve a greater overall performance, we have added an
extra feature (residue genomic conservation), significantly
extending the data as well as the number of different machine
learning (ML) techniques used. We have also tested two
different separated data sets: (i) protein-protein and (ii)
protein-nucleic acid complexes.
With these additions to the model, we attained a more
accurate and time-efficient HS detection methodology. More-
over, our method can be applied not only to protein-protein
but also to protein-nucleic acid complexes. This is the first
time that such a method has been applied solely to this type of
Received: December 22, 2014
Published: April 6, 2015
Article
pubs.acs.org/jcim
© 2015 American Chemical Society 1077 DOI: 10.1021/ci500760m
J. Chem. Inf. Model. 2015, 55, 1077-1086