Milk-Way algorithm for ligand-based virtual screening: CDK2 case study ABSTRACT Ligand-based screening of large molecular databases can help reduce costs with experiments by filtering and ranking promising compounds in an initial stage of the drug developing process. However, some ligand-based methods can be ineffective when presented with a high-dimensional number of attributes extracted from an extensive dataset of compounds. Herein, we propose a drug- mining algorithm that can be used to screen ligands and repurpose known drugs, from any dataset for any target. The Milk-Way algorithm combines mathematical and regression methods to select promising compounds from a high-dimensional dataset without the use of massive computational power. We carried out a prospective screening targeting cyclin-dependent kinase two (CDK2), an attractive target for therapeutics designed to arrest or recover control of the cell cycle. The combined use of the algorithm metrics and molecular docking suggested five promising drugs to be repositioned (Pramocaine, Prochlorperazine, Trifluoperazine, Methionine, and Pergolide), in which three were already mentioned as possible inhibitors of related diseases in the literature. KEYWORDS: algorithm, drug discovery, drug repurposing, ligand-based virtual screening, logistic regression, machine learning, development. 1. INTRODUCTION We present a new algorithm to screen novel compounds using CDK2 as the target. This is an enzyme that phosphorylates many proteins involved in cell cycle progression, DNA replication, histone synthesis, centrosome duplication, among other processes [1, 2]. Because of these functions, CDK2 represents an attractive target for therapeutics designed to arrest or recover control of the cell cycle in dividing cells [3], and since the enzyme is not essential for the cell cycle, its toxicity is not severe [4]. Despite the importance of the CDK2 protein, not many commercial drugs act against it. Thus, we investigated the use of drug repurposing as an aid to CDK2 drug development. Drug repurposing is the strategy of discovering new uses or conditions for approved drugs to not only assess the effects of the drug on a new target but also to reduce the cost of developing a new drug. Computational approaches, such as virtual screening (VS), have emerged as alternatives to screen large libraries of small molecules in a cost-efficient manner. Although VS approaches do not substitute experimental assays, they can speed up and rationalize the process of drug discovery, enriching the number of hits since it can downsize the number of candidates to be tested [5, 6]. In structure-based virtual screening (SBVS), the three-dimensional structure of the target is known, from x-ray crystallographic, NMR, or computational 1 Laboratory of Bioinformatics and Systems; 2 Department of Biochemistry and Immunology; 3 Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil. Carmelina Figueiredo Vieira Leite 1, * , Lucianna Helene Silva Santos 2 , Larissa Fernandes Leijôto 1 , Diego César Batista Mariano 1 , Rafael Eduardo Oliveira Rocha 2 and Marcos Augusto dos Santos 3 *Corresponding author: cleite@ufmg.br Trends in Developmental Biology Vol. 13, 2020