#56 - Binding affinity prediction using a nonparametric regression model based on physicochemical and structural descriptors of the nano-environment for protein-ligand interactions Luiz Borro 1,2 , Inacio Yano 2 , Ivan Mazoni 2 , Goran Neshich 2 1 University of Campinas 2 Embrapa Agriculture Informatics ABSTRACT We propose a new empirical scoring function for binding affinity prediction modeled based on physicochemical and structural descriptors that characterize the nano-environment that encompass both ligand and binding pocket residues. Our hypothesis is that a more detailed characterization of protein-ligand complexes in terms of describing nano-environment as precisely as possible can lead to improvements in binding affinity prediction. Similar hypothesis has already been proven valid in case of nano-environments for protein-protein interfaces 1 and catalytic site residues (yet to be published). INTRODUCTION In structure-based virtual screening campaigns, in silico protein-ligand complexes are evaluated and ranked according to their estimated binding affinities. Normally the ranking step is performed by using scoring functions, i.e. mathematical models that assess the strength of interaction between two binding partners. However, scoring functions are generally weak predictors of binding affinity mostly because they fail to model properly polar aspects of the protein-ligand interaction 2 . In order to improve binding affinity prediction, we propose an empiric nonparametric predictive model derived from physicochemical and structural descriptors that characterize the nano-environment that encompass both ligand atoms and binding pocket residues. METHODS Datasets. In order to ensure an unbiased performance comparison with other related approaches, we used the PDBbind v2007 refined set, which comprises of 1300 diverse protein-ligand complexes with high quality structural and binding data. The refined set was split into two disjoint sets: a training set of 1105 used for fitting the predictive models; and a test set of 195 complexes (known as core set) for performance evaluation. Protein-Ligand complex characterization. A given protein-ligand complex is represented by physicochemical and structural parameters from the nano-environment covering the ligand atoms and binding pocket residues. In order to obtain a more detailed characterization, special attention was given to descriptors related to the hydrophobic effect as well as to polar aspects of the protein- ligand binding. Descriptors were divided into three classes: Ligand-Only (7 descriptors), Protein-Only (6 descriptors) and Protein- Ligand (9 descriptors), as shown in Table 1. Protein-Only descriptors and Protein-Ligand descriptors were calculated through the STING platform 3 , whereas the Ligand-Only parameters were calculated using Biovia Pipeline Pilot. Table 1. List of descriptors used to characterize protein-ligand complexes. Class Descriptors Ligand-Only Volume, Polar Solvent-Accessible Surface Area, Strain Energy, Number of Hydrogen Bond (HB) donors, Number of HB Acceptors, AlogP, Number of Rotatable Bonds Protein-Only Hydrophobicity, Electrostatic Potential @ Surface, Unused Contacts Energy (HB, Charged, Hydrophobic, Aromatic) Protein-Ligand Protein-Ligand Interaction (HB, Charged, Hydrophobic, Aromatic), Ligand Buried Surface, Energy Density, Sponge, Density, Protein Hydrophobicity Variation Binding affinity prediction model. Using the descriptors listed on Table 1 and the experimental pKi of the training set complexes as input data, the binding affinity predictive model (herein called STING SF ) was trained as a regression-based random forest. RESULTS & CONCLUSIONS STING SF ’s performance was evaluated on the PDBbind benchmark v2007. Table 2 presents a performance comparison between STING SF and the top four previously tested scoring functions on the same benchmark. Clearly our predictive model ranks among the best with regard to binding affinity correlation, having a slightly inferior result in terms of R P when compared to RF-Score::Elem-v2. By statistically analyzing the contribution of each descriptor in the predictive model, we observed that the most important descriptors are related to shape complementarity (Ligand Buried Surface Area), hydrophobic effect (Hydrophobicity, ALogP) and polarity (Polar Solvent-Accessible Surface Area, Electrostatic Potential @ Surface). That result may suggest that STING SF can be further improved by expanding the characterization of protein-ligand complexes in terms of hydrophobicity and polarity complementary descriptors. Finally, considering STING SF ’s performance on the PDBbind benchmark v2007, the de facto standard for validation of scoring functions, we believe that our binding affinity predictive model can be a viable option for rescoring purposes in virtual screening campaigns. 116