Solvent Accessible Surface Area-Based Hot-Spot Detection Methods for Protein-Protein and Protein-Nucleic Acid Interfaces Cristian R. Munteanu, Antó nio C. Pimenta, Carlos Fernandez-Lozano, Andre ́ Melo, Maria N. D. S. Cordeiro, and Irina S. Moreira* ,,§ Information and Communication Technologies Department, Computer Science Faculty, University of A Coruna, Campus de Elviñ a s/n, 15071 A Coruñ a, Spain REQUIMTE/Departamento de Química e Bioquímica, Faculdade de Ciê ncias da Universidade do Porto, Rua do Campo Alegre s/n, 4169-007 Porto, Portugal § CNCCenter for Neuroscience and Cell Biology, Universidade de Coimbra, Rua Larga, FMUC, Polo I, 1°andar, 3004-517 Coimbra, Portugal * S Supporting Information ABSTRACT: Due to the importance of hot-spots (HS) detection and the eciency of computational methodologies, several HS detecting approaches have been developed. The current paper presents new models to predict HS for protein-protein and protein-nucleic acid interactions with better statistics compared with the ones currently reported in literature. These models are based on solvent accessible surface area (SASA) and genetic conservation features subjected to simple Bayes networks (protein-protein systems) and a more complex multi-objective genetic algorithm-support vector machine algorithms (protein-nucleic acid systems). The best models for these interactions have been implemented in two free Web tools. INTRODUCTION Proteins are essential macromolecules in several biochemical processes due to their ability to perform multiple tasks such as enzymatic catalysis, transport, and signal transduction among others. 1,2 To perform these tasks, proteins have to form complexes with other biomolecules, which is a fundamental step for several biochemical processes. It is therefore crucial to attain a complete understanding of the structural atomistic details of these interactions to better develop methods to inuence the binding. Although protein-based interfaces usually comprehend a high number of residues, it has been proved that the majority of the binding energy can be accounted for by the interaction of a small number of residues known as hot-spots (HS). 3-5 In order to investigate the contribution of a residue to the binding energy, the residue of interest is mutated to an alanine, and the binding free energy dierence (ΔΔG binding ) is calculated. The denitions of HS vary among authors, but it is most commonly accepted that HS are dened as residues with ΔΔG binding 2.0 kcal mol -1 ; the ones with ΔΔG binding < 2.0 kcal mol -1 are called null-spots (NS). 5 HS can be found experimentally, using molecular biology and thermodynamic methods upon alanine scanning mutagenesis (ASM), but these are not only expensive but often complex and time-consuming. Due to these diculties, computational approaches with a higher relation eciency/cost and lower experimental time have been developed. These can be generally described as empirical functions or knowledge-based models, all atom methods and feature-based approaches. 6-10 Although fully atomistic models were shown to accurately predict HS and be able to fully characterize this type of interactions, the time and complexity involved is often very high. 9,11 In our previous work we investigated feature-based methods that combine solvent accessible surface area (SASA) descriptors calculated from static structures and molecular dynamics (MD) ensembles, which were analyzed by a support vector machine (SVM) algorithm. 6 We presented a new HS predictive model: SASA-based hot-spot detection (SBHD). However, at the time our method was only applied to a small number of complexes, and it has been demonstrated to have a high number of false positives (incorrectly detect NS as HS). To improve this aspect and achieve a greater overall performance, we have added an extra feature (residue genomic conservation), signicantly extending the data as well as the number of dierent machine learning (ML) techniques used. We have also tested two dierent separated data sets: (i) protein-protein and (ii) protein-nucleic acid complexes. With these additions to the model, we attained a more accurate and time-ecient HS detection methodology. More- over, our method can be applied not only to protein-protein but also to protein-nucleic acid complexes. This is the rst time that such a method has been applied solely to this type of Received: December 22, 2014 Published: April 6, 2015 Article pubs.acs.org/jcim © 2015 American Chemical Society 1077 DOI: 10.1021/ci500760m J. Chem. Inf. Model. 2015, 55, 1077-1086