Current Medicinal Chemistry             Send Orders for Reprints to reprints@benthamscience.ae Current Medicinal Chemistry, 2017, 24, 2459-2470 2459 REVIEW ARTICLE Supervised Machine Learning Methods Applied to Predict Ligand- Binding Affinity 1875-533X/17 $58.00+.00 © 2017 Bentham Science Publishers Gabriela S. Heck a , Val O. Pintro a , Richard R. Pereira a , Mauricio B. de Ávila a,b , Nayara M.B. Levin a,b and Walter F. de Azevedo Jr. a,b,* a Laboratory of Computational Systems Biology, Faculty of Biosciences - Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil; b Graduate Program in Cellular and Molecular Biology, Faculty of Biosciences - Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil A R T I C L E H I S T O R Y Received: December 20, 2016 Revised: May 23, 2017 Accepted: June 06, 2017 DOI: 10.2174/0929867324666170623092503 Abstract: Background: Calculation of ligand-binding affinity is an open problem in compu- tational medicinal chemistry. The ability to computationally predict affinities has a benefi- cial impact in the early stages of drug development, since it allows a mathematical model to assess protein-ligand interactions. Due to the availability of structural and binding informa- tion, machine learning methods have been applied to generate scoring functions with good predictive power. Objective: Our goal here is to review recent developments in the application of machine learn- ing methods to predict ligand-binding affinity. Method: We focus our review on the application of computational methods to predict binding affinity for protein targets. In addition, we also describe the major available databases for ex- perimental binding constants and protein structures. Furthermore, we explain the most suc- cessful methods to evaluate the predictive power of scoring functions. Results: Association of structural information with ligand-binding affinity makes it possible to generate scoring functions targeted to a specific biological system. Through regression analysis, this data can be used as a base to generate mathematical models to predict ligand- binding affinities, such as inhibition constant, dissociation constant and binding energy. Conclusion: Experimental biophysical techniques were able to determine the structures of over 120,000 macromolecules. Considering also the evolution of binding affinity information, we may say that we have a promising scenario for development of scoring functions, making use of machine learning techniques. Recent developments in this area indicate that building scoring functions targeted to the biological systems of interest shows superior predictive per- formance, when compared with other approaches. Keywords: Machine learning, medicinal chemistry, binding affinity, regression, drug, enzyme, ligand-binding affinity. 1. INTRODUCTION The application of machine learning (ML) technique is not new to the studies of computational medicinal chemistry and systems biology. A recent literature search in PubMed conducted on May 22nd 2017 using *Address correspondence to this author at the Laboratory of Com- putational Systems Biology, Faculty of Biosciences - Pontifical Catholic University of Rio Grande do Sul (PUCRS), Av. Ipiranga, 6681, Porto Alegre-RS 90619-900, Brazil; E-mail: walter.junior@pucrs.br the keywords “machine learning” and “biology” re- turned 2266 scientific publications, as shown in Fig. (1). The oldest report dates back to 1985 [1]. In this list of publications, the first report to use the term “Ma- chine learning” in the paper title came out in 1988 [2]. There are examples of application of such methods to a wide variety of biological problems. For instance, the use of artificial neural networks to model complex bio- logical data [3], the application of a weighted variant of the K-nearest neighbor (KNN) to analyze the protein