Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra
for Substances (PASS)
Soheila Anzali,*
,†
Gerhard Barnickel,
†
Bertram Cezanne,
†
Michael Krug,
†
Dmitrii Filimonov,
‡
and
Vladimir Poroikov
‡
Bio- and Chemoinformatics Department, Merck KGaA, Darmstadt D-64271, Germany, and Institute of Biomedical Chemistry
of Russian Academy of Medical Sciences, Pogodinskaya Street, 10, Moscow 119832, Russia
Received August 31, 2000
Using the computer system PASS (prediction of activity spectra for substances), which predicts
simultaneously several hundreds of biological activities, a training set for discriminating
between drugs and nondrugs is created. For the training set, two subsets of databases of drugs
and nondrugs (a subset of the World Drug Index, WDI, vs the Available Chemicals Directory,
ACD) are used. The high value of prediction accuracy shows that the chemical descriptors and
algorithms used in PASS provide highly robust structure-activity relationships and reliable
predictions. Compared to other methods applied in this field, the direct benchmark undertaken
with this paper showed that the results obtained with PASS are in good accordance with these
approaches. In addition, it has been shown that the more specific drug information used in the
training set of PASS, the more specific discrimination between drug and nondrug can be
obtained.
Introduction
In the past decade the drug discovery process has
changed dramatically. The challenge to identify novel
leads has driven the need for automated systems that
can rapidly perform selection of compounds at the
beginning of the drug discovery process, namely in the
analysis and the extension of the high throughput
screening (HTS) pool. The number of discovered hits
depends on the cutoff level, e.g., 10 mM. First of all,
the activity needs have to be confirmed and then
followed by selectivity and functional assays.
An important task is the rejection of false hits and
focus on the promising molecules. The lead molecule
plays the pivotal role for the initiation of a lead
optimization project. A promising lead compound with
a desired pharmacological activity may have undesir-
able side effects, characteristics that limit its bioavail-
ability, or structural features which adversely influence
its metabolism and excretion from the body.
Therefore biological activity has to be balanced with
“drug-like” properties, and the closer we get to a
candidate compound, the more important drug-likeness
becomes. Despite the many attempts
1-11
to classify
compounds into the “drug” and “nondrug” categories,
there is no unambiguous definition for drug and non-
drug. Especially, it may vary depending the indications
or diseases considered.
12
Reagent databases such as
ACD,
13
as an example, is often used as a model database
for nondrug compounds, while CMC,
14
WDI,
15
and
MDDR
16
could be seen as databases for drugs. Certainly,
if one could consider the fate of some compounds in the
ACD database they may become drugs in the future,
whereas a few compounds from MDDR and WDI will
never be seen as drugs.
Because of the lack of discrimination among struc-
tural features for drug and nondrug compounds, differ-
ent approaches have to be applied to compensate. As
concluded by Walters et al.,
17
“future work is likely to
include additional approaches and more robust attempts
at validation of these methods.”
The PASS program,
18-22
which is based on a regres-
sion approach applied to noncongeneric chemical series,
provides highly robust predictions for more than 500
biological activities. Since PASS is trained to recognize
drugs with activities on various targets, the approach
may have potential use to discriminate drugs from
nondrugs. The purpose of this work is to evaluate the
ability of the PASS approach in discriminating between
drug-like compounds and nondrugs.
Materials and Methods
PASS Approach. The computer system PASS (prediction
of activity spectra for substances)
18-21
predicts several hun-
dreds of biological activities (pharmacological main and side
effects, mechanisms of action, mutagenicity, carcinogenicity,
teratogenicity, and embryotoxicity).
Biological activity results from the interaction of chemical
compounds with biological entities. In clinical studies, the
biological entity is the whole human organism. In preclinical
testing they are the experimental animal (in vivo) and/or the
experimental model (in vitro). Biological activity depends on
peculiarities of compound (structure and physicochemical
properties), biological entity (species, gender, age, etc.), and
mode of treatment (dose, route of administration, etc.).
The majority of biologically active compounds reveal often
a wide spectrum of different effects. Some of them are useful
in treatment of definite diseases; others cause various side and
toxic effects. The whole complex of activities caused by the
compound in biological entities is called the “biological activity
spectrum of the substance”.
The biological activity spectrum of a compound presents all
its activities despite the difference in essential conditions of
* Correspondence: Soheila Anzali, Ph.D., Merck KGaA, Bio- and
Chemoinformatics Department, Frankfurter Str. 250, D-64271 Darm-
stadt, Germany. Tel: +49-6151-724863. Fax: +49-6151-7233299.
E-mail: soheila.anzali@merck.de.
†
Merck KGaA.
‡
Institute of Biomedical Chemistry of Russian Academy of Medical
Sciences.
2432 J. Med. Chem. 2001, 44, 2432-2437
10.1021/jm0010670 CCC: $20.00 © 2001 American Chemical Society
Published on Web 06/14/2001