DDTEC-610; No of Pages 9 Please cite this article in press as: Xia X, et al. Graph-based generative models for de Novo drug design, Drug Discov Today: Technol (2020), https://doi.org/10.1016/j.ddtec.2020.11.004 TECHNOLOGIES DRUGDISCOVERY TODAY Graph-based generative models for de Novo drug design Xiaolin Xia, Jianxing Hu, Yanxing Wang, Liangren Zhang, Zhenming Liu * State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, 100191 Beijing, China The discovery of new chemical entities is a crucial part of drug discovery, which requires the lead compounds to have desired properties to be pharmaceutically ac- tive. De novo drug design aims to generate and opti- mize novel ligands for macromolecular targets from scratch. The development of graph-based deep gener- ative neural networks has provided a new method. In this review, we gave a brief introduction to graph representation and graph-based generative models for de novo drug design, summarized them as four architectures, and concluded eachs characteristics. We also discussed generative models for scaffold- and fragment-based design and graph-based generative modelsfuture directions. Section editors: Johannes Kirchmair University of Vienna, Department of Pharmaceutical Chemistry, Althanstrasse 14, 1090 Vienna, Austria. Introduction The development of new technologies is always having a profound impact on the evolution of drug discovery [1]. Classical pharmacology [2], aka forward pharmacology, relies on screening in vitro or in vivo to identify substances with desirable therapeutic effects and to identify and vali- date targets. With the development of bioinformatics, espe- cially after the sequencing of the human genome, reverse pharmacology [2], which usually identifies protein target first and performs the in vivo efficacy the last, has become popular. As a reverse pharmacology method, de novo drug design is the design of bioactive compounds by incremental construc- tion of a ligand model within a model of the receptor or enzyme active site, the structure of which is known from X-ray or nuclear magnetic resonance data (receptor-based design) or known ligands (ligand-based design) [3]. It has been estimated that the synthesizable chemical space might be as large as 10 60 –10 100 molecules, wherein 10 23 –10 60 [4] could be possible potential drug-like compounds, but only 10 8 –10 10 have been synthesized. High-throughput screening [5] and high-throughput virtual screening [6] can only search for the database part of the chemical space, while de novo drug design has the potentiality to discover new bioactive com- pounds. Generative modeling, which learns from the chemi- cal databases and generates hypotheses for searching under the iceberg, can be viewed as a de novo drug design variation. Recent success has proved deep learning to be applicable for reducing the time and cost of drug discovery [7]. Based on molecular graph representation that bridges between real molecules and the data format in computers for deep learning Drug Discovery Today: Technologies Vol. xxx, No. xx 2019 Editors-in-Chief Kelvin Lam Simplex Pharma Advisors, Inc., Boston, MA, USA Henk Timmerman Vrije Universiteit, The Netherlands *Corresponding author.: L. Zhang (liangren@bjmu.edu.cn), Z. Liu (zmliu@bjmu.edu.cn) 1740-6749/$ © 2020 Elsevier Ltd. All rights reserved. https://doi.org/10.1016/j.ddtec.2020.11.004 1