Explicit Diversity Index (EDI): A Novel Measure for Assessing the Diversity of Compound Databases AÄ kos Papp, Anna Gulya´s-Forro´, Zsolt Gulya´s, Gyo¨rgy Dorma´n, La´szlo´ U¨ rge, and Ferenc Darvas* AMRI Hungary, Za´hony u. 7, H-1031 Budapest, Hungary, ComGrid Ltd., Za´hony u. 7, H-1031 Budapest, Hungary, and CompuDrug International, Inc.,115 Morgan Drive, Sedona, Arizona 86351 Received March 7, 2006 A novel diversity assessment method, the Explicit Diversity Index (EDI), is introduced for druglike mole EDI combines structural and synthesis-related dissimilarity values and expresses them as a single numb As an easily interpretable measure, it facilitates the decision making in the design of combinatorial libr and it might assist in the comparison of compound sets provided by different manufacturers. Because o rapid calculation algorithm, EDI enables the diversity assessment of in-house or commercial compound collections. INTRODUCTION Molecular diversity is one of the most important charac- teristics of screening libraries. 1-4 To increase the hit rate, the use of the most representative compound set that covers the chemical space relevant to the appropriate target is advised. 1 To select the most favorable screening library, a simple measure (preferably expressed as a single number) thatexplicitly informs the medicinal chemistaboutthe diversity of the investigated compound set would be advan- tageous. There are already numerous diversity assessing methods in the literature which measure diversity on the basis of specific structural features of the molecules combined with different mathematical methods. 5-17 Many diverse library design procedures dealwith theselection ofthe most appropriate reagent sets and building blocks for synthesis. 18-21 In the presentpaper,we proposea noveldiversity assessment procedure, the Explicit Diversity Index (EDI), which describes diversity explicitly as a single number through combining structural and combinatorial synthesis- related dissimilarity. EDI is developed to provide assistance in designing diverse libraries primarily in the field of drug discovery. A recent study demonstrated 22 that structural dissimilarity does not contribute directly to biological activity. Surpris- ingly,only 30% ofthe purchased compounds having a Tanimoto similarity above 85% havebeen justified in biological tests. Thus,structural similarity or dissimilarity doesnot influencethe hit rateas expected. A recent publication focused more on scaffold diversity, claiming that increasing the hit rate can be achieved with more scaffolds 23 (in other words, with various diverse cores or chemotypes). According to this“uniform library concept”, an equal scaffold distribution is the optimum within a library. More recently, shape diversity 24 was also described, which relied on the 3D structure of the chemotype or skeleton. In this concept, common molecular skeletons display similar chemical information in the 3D space. On the basis of thes assumptions, Burke and Schreiber 25 introduced the term skeletal diversity in connection with forward-synthetic plan ning and diversity-oriented synthesis. They described vario examples of their“one synthesis/one skeleton” approach significantly increasing the chemical diversity. Hogan haspublished an example demonstrating two magnitudes of scaffold structures: 26 First, they may inherently represent biological activity,which isenhanced by the substituents introduced during synthesis. In other cases, n specific biological activity can be assigned to the core structure, but it providesa skeleton forarranging the substituents into the required directions. Walters et al. 27 noted thatthe latter case is particularly important in the field of drug design and combinatorial chemistry. In conclusion, th scaffold distribution (or as we termed core representativen is closely related to the combinatorial realization of divers thus,to the practice of library design and synthesis. In the proposed EDI approach, structural dissimilarity an the above combinatorial synthetic aspects are equally ac- counted for leading to a practical diversity measure. Struc turaldissimilarity is generally calculated by a comparison of pairwise dissimilarities of a target and a reference set. The combinatorial synthetic design aspects are considered by the involvement of “core representativeness”. In the present paper, we discuss the calculation of the elementsof EDI as well as its testing and application, examining various compound collections that are importan in the practice of combinatorial drug discovery. METHOD DESCRIPTION Calculation of Structural Dissimilarity. To characterize t structural dissimilarity, the nearest neighbor pairwise dis- similarities of the molecules in the library are calculated. For calculation of the nearest-neighbor’s average value, w * Correspondingauthorphone: +36 1 214 2306; e-mail: df.compudrug@worldnet.att.net. AMRI Hungary. ComGrid Ltd. § CompuDrug International. 1898 J. Chem. Inf.Model. 2006, 46,1898-1904 10.1021/ci060074f CCC: $33.50 © 2006 American Chemical Society Published on Web 08/17/2006