J. zyxwvutsrqp Chem. Infi Comput. zyxwvut Sci. 1995, zyxwv 35, 479-493 479 SPROUT: 3D Structure Generation Using Templates Paulina Mata,? Valerie J. Gillet,t A. Peter Johnson,**$ Jorge Lampreia,? Glenn J. Myatt,: Sandor Sike,s and Anna L. Stebbingd Departamento de Quimica, Faculdade de Ciencias e Tecnologia, Universidade Nova de Lisboa, Quinta da Torre, 2825 Monte da Caparica, Portugal, School of Chemistry, University of Leeds, Leeds LS2 9JT, U.K., and Eotvos Lor6nd University, Altallnos SzMtlstudominyi TanszCk, Budapest, Bogdlnfy u. lob, Hungary Received July 2, 1994@ SPROUT is a computer program for the rational design of molecules for a range of applications in molecular recognition. Molecular graphs are built in a stepwise fashion by subgraph addition. Several heuristics are being explored to restrict the combinatorial explosion that is inherent in structure generation. These include the use of generalized molecular fragments, called templates, as building blocks. Structure generation consists of two stages: (i) the generation of skeletons from templates that satisfy steric constraints and (ii) the substitution of heteroatoms into skeletons to produce molecules that satisfy other constraints such as electrostatics. The choice and definition of the templates and template joining rules are described together with a description of the atom substitution process. INTRODUCTION SPROUT'-3 is a program designed to generate molecules appropriate to a wide range of applications in molecular recognition, e.g., the de novo design of enzyme inhibitors, catalysts, or agents for asymmetric synthesis. One of the main problems to face in de novo design is the combinatorial explosion that is inherent in structure generation; attempts at finding solutions quickly lead to a large number of possibilities. This type of problem has been well studied in artificial intelligence, where methods have been devised for delaying and moderating the combinatorial explo~ion.~ Generally, efficient solution methods require knowledge about the problem domain to direct the search, Le., heuristics. In our approach several heuristics are being explored. This paper is concerned with heuristics that are derived from chemical knowledge about commonly occurring substructural fragments. These heuristics have lead to the definition of a set of templates that are used as building blocks for structure generation, and the definition of a set of rules that described how the templates can be joined in the process of structure generation. STRUCTURE GENERATION Structure generation techniques can be applied to the problem of de novo structure design where the aim is to build a wide range of molecules with a given set of steric and chemical properties. Structures can be generated in a brute- force approach by beginning with a single atom and then sequentially adding one atom and bond at a time to the growing partial structures. This approach has been used in a number of programs that have been described in the However, building all possible solution struc- tures in this way is computationally impossible because of the combinatorialexplosion of possibilities that would result. The computational effort can be reduced to some extent by using molecular fragments at each addition step rather than ' Universidade Nova de Lisboa. ' University of Leeds. zyxwvutsrqp 9 Eotvos LorBnd University. @ Abstract published in Advance ACS Abstracts, January 15, 1995. 0095-233819511635-0479$09.00/0 single atoms. The generation of chemical structures in 2D through the joining of substructures has already been used successfully in several domains.8-10 In the case of 3D structures, the principle underlying the process is that the joining of a set of conformationallyreasonable substructures can result in a molecule that also has a reasonable conforma- tion. This approach has been used successfully in the WIZARD' and COBRAI2 programs for conformational analysis. For de novo structure design, however, an enor- mous number of fragments is required to enable a wide range of molecules to be produced, even when the fragments are restricted to low energy conformations. Thus exhaustive searching of structure space using molecular fragments is also prohibitive, and the programs described in the literature that use this methodl3.l4 use different techniques to sample structure space. A further class of programs for de novo structure design operate by positioning fragments at the interaction sites within the active site of an enzyme and then linking them by searching for fragments within a SPROUT SPROUT'-3 uses information about one molecule to constrain the design of others with which it can interact. SPROUT generates structures from fragments; however, heuristics are used to reduce the problem in order that a representative search can be made over all structure space and a wide range of different classes of structures can be generated. The main factor in reducing the problem to a manageable size is the use of generalized fragments, or templates, as building blocks, along with an associated set of rules for controlling the ways in which the templates can be joined. A brief outline of the program is given before focusing on the main topic of this paper, Le., a detailed description of the templates and the template joining process. In SPROUT, the receptor site is used to define a volume for structure generation and some target sites within the volume. Target sites are small regions of space where it is desirable to place a ligand atom in order to promote an interaction such as a hydrogen bond between the ligand and the receptor. In SPROUT structure generation has been 1995 American Chemical Society