Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra for Substances (PASS) Soheila Anzali,* ,† Gerhard Barnickel, Bertram Cezanne, Michael Krug, Dmitrii Filimonov, and Vladimir Poroikov Bio- and Chemoinformatics Department, Merck KGaA, Darmstadt D-64271, Germany, and Institute of Biomedical Chemistry of Russian Academy of Medical Sciences, Pogodinskaya Street, 10, Moscow 119832, Russia Received August 31, 2000 Using the computer system PASS (prediction of activity spectra for substances), which predicts simultaneously several hundreds of biological activities, a training set for discriminating between drugs and nondrugs is created. For the training set, two subsets of databases of drugs and nondrugs (a subset of the World Drug Index, WDI, vs the Available Chemicals Directory, ACD) are used. The high value of prediction accuracy shows that the chemical descriptors and algorithms used in PASS provide highly robust structure-activity relationships and reliable predictions. Compared to other methods applied in this field, the direct benchmark undertaken with this paper showed that the results obtained with PASS are in good accordance with these approaches. In addition, it has been shown that the more specific drug information used in the training set of PASS, the more specific discrimination between drug and nondrug can be obtained. Introduction In the past decade the drug discovery process has changed dramatically. The challenge to identify novel leads has driven the need for automated systems that can rapidly perform selection of compounds at the beginning of the drug discovery process, namely in the analysis and the extension of the high throughput screening (HTS) pool. The number of discovered hits depends on the cutoff level, e.g., 10 mM. First of all, the activity needs have to be confirmed and then followed by selectivity and functional assays. An important task is the rejection of false hits and focus on the promising molecules. The lead molecule plays the pivotal role for the initiation of a lead optimization project. A promising lead compound with a desired pharmacological activity may have undesir- able side effects, characteristics that limit its bioavail- ability, or structural features which adversely influence its metabolism and excretion from the body. Therefore biological activity has to be balanced with “drug-like” properties, and the closer we get to a candidate compound, the more important drug-likeness becomes. Despite the many attempts 1-11 to classify compounds into the “drug” and “nondrug” categories, there is no unambiguous definition for drug and non- drug. Especially, it may vary depending the indications or diseases considered. 12 Reagent databases such as ACD, 13 as an example, is often used as a model database for nondrug compounds, while CMC, 14 WDI, 15 and MDDR 16 could be seen as databases for drugs. Certainly, if one could consider the fate of some compounds in the ACD database they may become drugs in the future, whereas a few compounds from MDDR and WDI will never be seen as drugs. Because of the lack of discrimination among struc- tural features for drug and nondrug compounds, differ- ent approaches have to be applied to compensate. As concluded by Walters et al., 17 “future work is likely to include additional approaches and more robust attempts at validation of these methods.” The PASS program, 18-22 which is based on a regres- sion approach applied to noncongeneric chemical series, provides highly robust predictions for more than 500 biological activities. Since PASS is trained to recognize drugs with activities on various targets, the approach may have potential use to discriminate drugs from nondrugs. The purpose of this work is to evaluate the ability of the PASS approach in discriminating between drug-like compounds and nondrugs. Materials and Methods PASS Approach. The computer system PASS (prediction of activity spectra for substances) 18-21 predicts several hun- dreds of biological activities (pharmacological main and side effects, mechanisms of action, mutagenicity, carcinogenicity, teratogenicity, and embryotoxicity). Biological activity results from the interaction of chemical compounds with biological entities. In clinical studies, the biological entity is the whole human organism. In preclinical testing they are the experimental animal (in vivo) and/or the experimental model (in vitro). Biological activity depends on peculiarities of compound (structure and physicochemical properties), biological entity (species, gender, age, etc.), and mode of treatment (dose, route of administration, etc.). The majority of biologically active compounds reveal often a wide spectrum of different effects. Some of them are useful in treatment of definite diseases; others cause various side and toxic effects. The whole complex of activities caused by the compound in biological entities is called the “biological activity spectrum of the substance”. The biological activity spectrum of a compound presents all its activities despite the difference in essential conditions of * Correspondence: Soheila Anzali, Ph.D., Merck KGaA, Bio- and Chemoinformatics Department, Frankfurter Str. 250, D-64271 Darm- stadt, Germany. Tel: +49-6151-724863. Fax: +49-6151-7233299. E-mail: soheila.anzali@merck.de. Merck KGaA. Institute of Biomedical Chemistry of Russian Academy of Medical Sciences. 2432 J. Med. Chem. 2001, 44, 2432-2437 10.1021/jm0010670 CCC: $20.00 © 2001 American Chemical Society Published on Web 06/14/2001