Ordinal Classification Using Comparative Molecular Field Analysis Takanori Ohgaru, †,‡ Ryo Shimizu, Kousuke Okamoto, Masaya Kawase, § Yuko Shirakuni, Rika Nishikiori, § and Tatsuya Takagi* ,†,| Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan, Tanabe Seiyaku Co., Ltd., 3-16-89 Kashima, Yodogawa-Ku, Osaka, 532-8505, Japan, Faculty of Pharmacy, Osaka Ohtani University, 3-11-1 Nishikiorikita, Tondabayashi, Osaka, 584-8540, Japan, and Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka, 565-0871, Osaka, Japan Received July 4, 2007 Comparative Molecular Field Analysis (CoMFA) is most widely used as one of the 3-dimensional QSAR (3D-QSAR) methods to identify the relationship between chemical structure and biological activity. Conventional CoMFA requires at least 3 orders of experimental data, such as IC 50 and K i , to obtain a good model, although practically there are many screening assays where biological activity is measured only by a rating scale. Hence, rating classification-oriented CoMFA coupled with ordinal logistic regression has been developed, and its predictive ability and 3D graphical analysis ability have been investigated. As a result, this novel CoMFA (Logistic CoMFA) has been found to be more robust than conventional CoMFAs in both predictive and 3D graphical analysis abilities. Furthermore, Logistic CoMFA is useful since it can provide the probability of each rank. INTRODUCTION A detailed understanding of the quantitative structure- activity relationship (QSAR) is one of the principal goals of medicinal chemistry. To be able to clarify the relationship between chemical structure and biological activity is very important, particularly in the hit-to-lead stage of drug discovery. Researchers need to identify various properties of a large number of compounds in a limited period of the hit-to-lead stage. The growing need for early ADMET 1-3 increases the number of biological assays, such as Caco-2 cell permeability, CYP families inhibition, and hERG blockade, per compound. Unfortunately there are more experimental errors in screening data in the early screening stage of drug discovery than in reliable assays employed in the late stage. Since QSAR analysis generally makes use of IC 50 and pK i values as the indices of biological response, non-negligible differences between experimental and true IC 50 /pK i values can be found in some screening assays. 4,5 In addition, there are many in vivo assays where biological activity is measured only by a rating scale. These circum- stances make it difficult to build a good QSAR model. Prediction of activity rating, in which the potency of a compound is rather roughly assigned, enables us to quanti- tatively analyze the data set, which has not been able to be quantitatively analyzed because of noise. Treatment of a couple of data would be necessary to determine the rating classification since the ratings are not expressed in a metric scale. Several studies on the application of the rating classifica- tion to classical QSAR have been performed. Martin et al. 6 conducted a classical QSAR analysis of monoamine oxidase inhibition by using a rating scale with linear discriminant analysis (LDA). Dunn, W. J., III et al. 7 analyzed the QSAR of -adrenergic agents with the SIMCA (Soft Independent Modelling of Class Analogy) method, 8 which is based on a pattern recognition technique. LDA and SIMCA methods are not considered to be suitable for rating classification because both are introduced under the assumption that classes are independent. To harness the characteristics of ordinal classes, Takahashi et al. 9 developed ORMUCS (ORdered Multicategorical Classification using Simplex optimization technique). ORMUCS is also a pattern recognition method that determines a discriminant function using a simplex optimization. Apart from these methods, it is possible to apply the ordinal logistic regression method (OLR) to QSAR analysis. OLR is considered a statistical method that uses the probability of each rating for classification. In fact, OLR is one of the most popular methods used in social psycho- logical studies and is more often applied to clinical data. 10,11 Comparative Molecular Field Analysis (CoMFA) has become one of the most widely used 3-dimensional QSAR (3D-QSAR) methods 12-14 since it was introduced by Cramer et al. 15 to identify the relationship between 3-dimensional molecular structure and biological activity. Prevalence of commercial cheminformatics tools, such as Sybyl and high- performance CPU, makes it convenient to use 3D-QSAR analysis. However, the rating classification-oriented 3D- QSAR method has not yet been developed. Considering the prevalence of 3D-QSAR, it is desirable to classify rating with a 3D-QSAR method. 3D-QSAR analysis with SIMCA has not been used for rating classification and, unfortunately, has been limited to dichotomous (active/inactive) analysis 16 or selectivity analysis. 17 In this study, we present the development and applicability of a novel rating classification-oriented CoMFA with OLR and compare it to conventional CoMFA analysis using 2 data sets. One data set is the corticosteroid binding globulin * Corresponding author e-mail: satan@gen-info.osaka-u.ac.jp. Graduate School of Pharmaceutical Sciences, Osaka University. Tanabe Seiyaku Co., Ltd. § Osaka Ohtani University. | Research Institute for Microbial Diseases, Osaka University. 207 J. Chem. Inf. Model. 2008, 48, 207-212 10.1021/ci700238k CCC: $40.75 © 2008 American Chemical Society Published on Web 12/28/2007