Enhancement of Ordinal CoMFA by Ridge Logistic Partial Least Squares Takanori Ohgaru, †,‡ Ryo Shimizu, § Kosuke Okamoto, Norihito Kawashita, †,| Masaya Kawase, Yuko Shirakuni, Rika Nishikiori, and Tatsuya Takagi* ,†,| Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan, Medicinal Chemistry Laboratory, Mitsubishi Tanabe Pharma Corporation, 3-16-89 Kashima, Yodogawa-Ku, Osaka 532-8505, Japan, Corporate Strategy Department, Mitsubishi Tanabe Pharma Corporation, 3-2-10, Dosho-machi, Chuo-Ku, Osaka 541-8505, Japan, Research Institute for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka 565-0871, Japan, and Faculty of Pharmacy, Osaka Ohtani University, 3-11-1 Nishikiorikita, Tondabayashi, Osaka 584-8540 Japan Received November 29, 2007 Conventional comparative molecular field analysis (CoMFA) requires at least 3 orders of experimental data, such as IC 50 and K i , to obtain a good model, although practically there are many screening assays where biological activity is measured only by rating scale. To improve three-dimensional quantitative structure– activity relationship (3D-QSAR) analysis, we developed in this study a modified ordinal classification- oriented CoMFA using partial-least-squares generalized linear regression and ridge estimation. The modified Logistic CoMFA was validated using a corticosteroid binding globulin receptor binding data set, a benchmark for 3D-QSAR, and an acetylcholine esterase inhibitor data set. Our results show that modification of Logistic CoMFA enhanced both prediction accuracy and 3D graphical analysis. In addition, the 3D graphical analysis of the modified Logistic CoMFA was much improved. This improvement resulted in more accurate information on the binding mode between proteins and ligands than in the case of conventional CoMFA. INTRODUCTION Quantitative structure–activity relationships (QSAR) are used to establish a correlation between chemical structure and specific biological activity, 1 and the derived models are used to predict the activity of untested compounds. This correlation is one of the most important steps in drug discovery, particularly in the hit-to-lead stage. The prevalence of commercial cheminformatics tools, such as Sybyl, makes it convenient to perform three-dimensional QSAR (3D- QSAR) analysis. Above all, comparative molecular field analysis (CoMFA) 2 is a widely used approach for generating descriptors based on 3D structural information of molecules. In real screening, many assays measure compounds’ biological activity only by rating scale. Under such circum- stances, it is difficult to obtain good CoMFA models, since relatively accurate experimental IC 50 and pK i values are generally required. In a preceding paper, we have proposed an ordinal classification approach using CoMFA (Logistic CoMFA) and showed that this approach is better and more robust than conventional CoMFA with rating scale activity. 3 Logistic CoMFA couples CoMFA with ordinal logistic regression (OLR), which classifies samples according to the probability of each rank. Unfortunately, ordinary algorithms of logistic regression analysis do not converge in some cases. 4,5 Infinite parameter estimates can occur depending on the configuration of the sample points in the observation space. 6 Using PLS with penalized logistic regression, Fort and Lacroix proposed a robust samples classification. 7 Their method is based on ridge estimators in the logistic regression reported by Le Cessie and Van Houwelingen. 8 Both methods, however, were developed for analyzing a binary response variable, and there is no way of applying them to ordinal classification. Although multigroup iteratively reweighted partial least squares (MIRWPLS) was generalized for mul- tigroup classification by Ding and Gentleman, 9 this method is in principle used to analyze nominal data. Thus, MIRW- PLS treats the ratings as nominals with no special ordering. Additionally, with MIRWPLS, it is difficult to estimate parameters because, unlike MIRWPLS for binary data analysis, MIRWPLS uses the (number of classes - 1) 2 -fold size of a sparse matrix for explanatory variables. By and large, CoMFA uses as many as >1 000 explanatory variables and requires a huge-sized matrix of variables. Hence, applying MIRWPLS to CoMFA is impracticable. Recently, Bastein et al. 10 contrived a new logistic PLS algorithm, which is based on the PLS generalized linear regression (PLS-GLR) model. The PLS-GLR approach performs OLR analysis on every explanatory variable without enlarging the size of the explanatory matrix in the process of computation of the latent variables. In this study, we modified Logistic CoMFA by harnessing two approaches. The first modification is the application of Logistic CoMFA to PLS-GLR instead of ordinary OLR- based PLS. The second is the incorporation of Logistic CoMFA with ridge penalty estimation, although, as men- tioned above, the occurrence of a convergence problem was a possibility in some cases. Next, we compared the modified ordinal classification CoMFA with Logistic CoMFA. * Corresponding author e-mail: satan@gen-info.osaka-u.ac.jp. Graduate School of Pharmaceutical Sciences, Osaka University. Medicinal Chemistry Laboratory, Mitsubishi Tanabe Pharma Corporation. § Corporate Strategy Department, Mitsubishi Tanabe Pharma Corporation. | Research Institute for Microbial Diseases, Osaka University. Faculty of Pharmacy, Osaka Ohtani University. J. Chem. Inf. Model. 2008, 48, 910–917 910 10.1021/ci700444z CCC: $40.75 2008 American Chemical Society Published on Web 03/14/2008