Explicit Diversity Index (EDI): A Novel Measure for Assessing the Diversity of
Compound Databases
AÄ kos Papp,
†
Anna Gulya´s-Forro´,
†
Zsolt Gulya´s,
‡
Gyo¨rgy Dorma´n,
†
La´szlo´ U¨ rge,
†
and
Ferenc Darvas*
,§
AMRI Hungary, Za´hony u. 7, H-1031 Budapest, Hungary, ComGrid Ltd., Za´hony u. 7, H-1031 Budapest,
Hungary, and CompuDrug International, Inc.,115 Morgan Drive, Sedona, Arizona 86351
Received March 7, 2006
A novel diversity assessment method, the Explicit Diversity Index (EDI), is introduced for druglike mole
EDI combines structural and synthesis-related dissimilarity values and expresses them as a single numb
As an easily interpretable measure, it facilitates the decision making in the design of combinatorial libr
and it might assist in the comparison of compound sets provided by different manufacturers. Because o
rapid calculation algorithm, EDI enables the diversity assessment of in-house or commercial compound
collections.
INTRODUCTION
Molecular diversity is one of the most important charac-
teristics of screening libraries.
1-4
To increase the hit rate,
the use of the most representative compound set that covers
the chemical space relevant to the appropriate target is
advised.
1
To select the most favorable screening library, a
simple measure (preferably expressed as a single number)
thatexplicitly informs the medicinal chemistaboutthe
diversity of the investigated compound set would be advan-
tageous.
There are already numerous diversity assessing methods
in the literature which measure diversity on the basis of
specific structural features of the molecules combined with
different mathematical methods.
5-17
Many diverse library
design procedures dealwith theselection ofthe most
appropriate reagent sets and building blocks for synthesis.
18-21
In the presentpaper,we proposea noveldiversity
assessment procedure, the Explicit Diversity Index (EDI),
which describes diversity explicitly as a single number
through combining structural and combinatorial synthesis-
related dissimilarity. EDI is developed to provide assistance
in designing diverse libraries primarily in the field of drug
discovery.
A recent study demonstrated
22
that structural dissimilarity
does not contribute directly to biological activity. Surpris-
ingly,only 30% ofthe purchased compounds having a
Tanimoto similarity above 85% havebeen justified in
biological tests. Thus,structural similarity or dissimilarity
doesnot influencethe hit rateas expected. A recent
publication focused more on scaffold diversity, claiming that
increasing the hit rate can be achieved with more scaffolds
23
(in other words, with various diverse cores or chemotypes).
According to this“uniform library concept”, an equal
scaffold distribution is the optimum within a library.
More recently, shape diversity
24
was also described, which
relied on the 3D structure of the chemotype or skeleton. In
this concept, common molecular skeletons display similar
chemical information in the 3D space. On the basis of thes
assumptions, Burke and Schreiber
25
introduced the term
skeletal diversity in connection with forward-synthetic plan
ning and diversity-oriented synthesis. They described vario
examples of their“one synthesis/one skeleton” approach
significantly increasing the chemical diversity.
Hogan haspublished an example demonstrating two
magnitudes of scaffold structures:
26
First, they may inherently
represent biological activity,which isenhanced by the
substituents introduced during synthesis. In other cases, n
specific biological activity can be assigned to the core
structure, but it providesa skeleton forarranging the
substituents into the required directions. Walters et al.
27
noted
thatthe latter case is particularly important in the field of
drug design and combinatorial chemistry. In conclusion, th
scaffold distribution (or as we termed core representativen
is closely related to the combinatorial realization of divers
thus,to the practice of library design and synthesis.
In the proposed EDI approach, structural dissimilarity an
the above combinatorial synthetic aspects are equally ac-
counted for leading to a practical diversity measure. Struc
turaldissimilarity is generally calculated by a comparison
of pairwise dissimilarities of a target and a reference set.
The combinatorial synthetic design aspects are considered
by the involvement of “core representativeness”.
In the present paper, we discuss the calculation of the
elementsof EDI as well as its testing and application,
examining various compound collections that are importan
in the practice of combinatorial drug discovery.
METHOD DESCRIPTION
Calculation of Structural Dissimilarity. To characterize t
structural dissimilarity, the nearest neighbor pairwise dis-
similarities of the molecules in the library are calculated.
For calculation of the nearest-neighbor’s average value, w
* Correspondingauthorphone: +36 1 214 2306; e-mail:
df.compudrug@worldnet.att.net.
†
AMRI Hungary.
‡
ComGrid Ltd.
§
CompuDrug International.
1898
J. Chem. Inf.Model. 2006,
46,1898-1904
10.1021/ci060074f CCC: $33.50 © 2006 American Chemical Society
Published on Web 08/17/2006