DOI: 10.1002/minf.201200120 Activity Landscapes, Information Theory, and Structure – Activity Relationships Preeti Iyer, [a] Dagmar Stumpfe, [a] Martin Vogt, [a] J. Bajorath,* [a] and G. M. Maggiora* [b] 1 Introduction The pace at which new activity data is being generated has increased almost exponentially in the last decade, giving rise to a growing number of large, public domain databas- es such as ChEMBL, [1] BindingDB, [2] or PubChem. [3] This has necessitated the development of new methods for charac- terizing and analyzing compound activity data. These methods draw heavily upon the concepts of molecular sim- ilarity and chemical space, [4] where individual molecules are represented as points. The more similar a pair of molecules in that space the shorter the distance between their points. As is well known, similarity measures and their associated chemical spaces are not invariant to the features or descrip- tors used to define them. Thus, different representations can induce different distributions of molecules in chemical spaces. Although this can potentially lead to problems, nu- merous examples exist that demonstrate their practical util- ity in many cheminformatic applications. These include the analysis and clustering of compound collections, the selec- tion of subsets for screening, the acquisition of compounds to enlarge and enrich compound collections, and the appli- cation of ligand-based virtual screening in early drug dis- covery. [5–10] Because of their importance in drug discovery research, a number of approaches have been developed that attempt to address, or circumvent, the non-invariance problem including the use of data fusion and consensus similarity measures, [11–15] albeit often with only partial suc- cess. Activity landscapes can be constructed from chemical spaces by adding an additional dimension associated with biological activity. The resulting landscapes are similar to those represented by topographical maps. Thus, they pro- vide a strong visual metaphor for describing and interpret- ing the local and global features associated with the bio- logical activities of molecules in large compound collec- tions. [16–18] Four types of topographic features are typically found in activity landscapes: (1) activity cliffs, (2) similarity cliffs, (3) smooth-SAR regions, and (4) featureless regions. Only the first three are relevant to drug discovery research and thus, Abstract : Activity landscapes provide a comprehensive de- scription of structure-activity relationships (SARs). An infor- mation theoretic assessment of their features, namely, activ- ity cliffs, similarity cliffs, smooth-SAR, and featureless re- gions, is presented based on the probability of occurrence of these features. It is shown that activity cliffs provide highly informative SARs compared to smooth-SAR regions, although the latter are the basis for most QSAR studies. This follows since small structural changes in the former are coupled with relatively large changes in activity, thus pinpointing specific structural features associated with the changes in activity. In contrast, Smooth-SAR regions are typically associated with relatively small changes in both structure and activity. Surprisingly, similarity cliffs, which occur when both compounds in a compound-pair have ap- proximately equal activities but significantly different struc- tures, are the most prevalent feature of activity landscapes. Hence, from an information theoretic point of view, they are the least informative landscape feature. Nevertheless, similarity cliffs do provide SAR information on potentially new active compound classes, and in that sense they are quite useful in drug discovery programs since they provide alternative possibilities should ADMET or other issues arise during the discovery and earlier preclinical development phases of drug research. Keywords: Structure-activity relationships · Molecular similarity · Activity landscapes · Activity cliffs · Similarity cliffs · Information theory [a] P. Iyer, D. Stumpfe, M. Vogt, J. Bajorath Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich- Wilhelms-Universität Bonn Dahlmannstr. 2, D-53113 Bonn phone/fax: + 49-228-2699-306/341 *e-mail: bajorath@bit.uni-bonn.de [b] G. M. Maggiora College of Pharmacy & BIO5 Institute, University of Arizona, Translational Genomics Research Institute 1295 North Martin, PO Box 210202, Tucson, AZ 85721, USA, 445 North Fifth Street, Phoenix, AZ 85004, USA *e-mail: gerry.maggiora@gmail.com Supporting Information for this article is available on the WWW under http://dx.doi.org/10.1002/minf.201200120 Special Issue EuroQSAR Mol. Inf. 0000, 00, 1 – 10 0000 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim &1& These are not the final page numbers! ÞÞ