An Ontology for Pharmaceutical Ligands and Its Application for in Silico Screening and Library Design Ansgar Schuffenhauer,* ,² Ju ¨rg Zimmermann, ‡ Ruedi Stoop, § Jan-Jan van der Vyver, § Steffano Lecchini, § and Edgar Jacoby* ,² Novartis Pharma AG, Drug Discovery Center, Compound Management and Computation Unit, CH-4002 Basel, Switzerland, Novartis Pharma AG, Central Technologies, Combinatorial Chemistry Unit, CH-4002 Basel, Switzerland, and Institute of Neuroinformatics, University/ETH Zu ¨rich, Winterthurerstrasse 190, CH-8057 Zu ¨rich, Switzerland Received November 9, 2001 Annotation efforts in biosciences have focused in past years mainly on the annotation of genomic sequences. Only very limited effort has been put into annotation schemes for pharmaceutical ligands. Here we propose annotation schemes for the ligands of four major target classes, enzymes, G protein-coupled receptors (GPCRs), nuclear receptors (NRs), and ligand-gated ion channels (LGICs), and outline their usage for in silico screening and combinatorial library design. The proposed schemes cover ligand functionality and hierarchical levels of target classification. The classification schemes are based on those established by the EC, GPCRDB, NuclearDB, and LGICDB. The ligands of the MDL Drug Data Report (MDDR) database serve as a reference data set of known pharmacologically active compounds. All ligands were annotated according to the schemes when attribution was possible based on the activity classification provided by the reference database. The purpose of the ligand-target classification schemes is to allow annotation-based searching of the ligand database. In addition, the biological sequence information of the target is directly linkable to the ligand, hereby allowing sequence similarity-based identification of ligands of next homologous receptors. Ligands of specified levels can easily be retrieved to serve as comprehensive reference sets for cheminformatics-based similarity searches and for design of target class focused compound libraries. Retrospective in silico screening experiments within the MDDR01.1 database, searching for structures binding to dopamine D2, all dopamine receptors and all amine-binding class A GPCRs using known dopamine D2 binding compounds as a reference set, have shown that such reference sets are in particular useful for the identification of ligands binding to receptors closely related to the reference system. The potential for ligand identification drops with increasing phylogenetic distance. The analysis of the focus of a tertiary amine based combinatorial library compared to known amine binding class A GPCRs, peptide binding class A GPCRs, and LGIC ligands constitutes a second application scenario which illustrates how the focus of a combinatorial library can be treated quantitatively. The provided annotation schemes, which bridge chem- and bioinformatics by linking ligands to sequences, are expected to be of key utility for further systematic chemogenomics exploration of previously well explored target families. INTRODUCTION The immediate impact of the completion of the human genome project to the drug discovery process is its further systematization. All targets of a particular gene family are now visible, and systematic exploration of selected target families without a priori restriction to a specific therapeutic area appears to be a promising way to speed up the lead finding process. Beyond target validation, the challenge reverts to medicinal chemistry to find ligands for the sequences and to provide the molecules with which their novel biology and pharmacology can be studied. The newly identified macromolecular receptors may belong in part to established therapeutically important target classes such as enzymes, GPCRs, NRs, and LGICs, which are the most successful drug target families and which are early examples of the systematization approach. Correspondingly, every newly discovered orphan receptor of these classes can be considered as a potential drug target. 1 Because of the broad knowledge existing about the previously investigated mem- bers of these families, including the structural classes of pharmaceutically active compounds and sequence informa- tion, it is a logical expectation that the pharmacological investigation of the new targets should benefit from knowl- edge-based compound selection and design strategies which try to extract relevant characteristics from the established knowledge. To realize this expectation, given that the chem- and bioinformatics worlds have evolved more or less independently, it is necessary to establish necessary cross references by appropriate annotation schemes. Annotation efforts in biosciences have focused in the past years mainly * Corresponding author phone: +41 61 32 45385; fax: +41 61 3242395; e-mail: ansgar.schuffenhauer@pharma.novartis.com (Schuffenhauer); phone: +41 61 32 46186; fax: +41 61 3242395; e-mail: edgar.jacoby@ pharma.novartis.com (Jacoby). ² Novartis Pharma AG, Drug Discovery Center, Compound Management and Computation Unit. ‡ Novartis Pharma AG, Central Technologies, Combinatorial Chemistry Unit. § University/ETH Zu ¨rich. 947 J. Chem. Inf. Comput. Sci. 2002, 42, 947-955 10.1021/ci010385k CCC: $22.00 © 2002 American Chemical Society Published on Web 05/23/2002