Ordinal Classification Using Comparative Molecular Field Analysis
Takanori Ohgaru,
†,‡
Ryo Shimizu,
‡
Kousuke Okamoto,
†
Masaya Kawase,
§
Yuko Shirakuni,
†
Rika Nishikiori,
§
and Tatsuya Takagi*
,†,|
Graduate School of Pharmaceutical Sciences, Osaka University, 1-6 Yamadaoka, Suita, Osaka 565-0871, Japan,
Tanabe Seiyaku Co., Ltd., 3-16-89 Kashima, Yodogawa-Ku, Osaka, 532-8505, Japan, Faculty of Pharmacy,
Osaka Ohtani University, 3-11-1 Nishikiorikita, Tondabayashi, Osaka, 584-8540, Japan, and Research Institute
for Microbial Diseases, Osaka University, 3-1 Yamadaoka, Suita, Osaka, 565-0871, Osaka, Japan
Received July 4, 2007
Comparative Molecular Field Analysis (CoMFA) is most widely used as one of the 3-dimensional QSAR
(3D-QSAR) methods to identify the relationship between chemical structure and biological activity.
Conventional CoMFA requires at least 3 orders of experimental data, such as IC
50
and K
i
, to obtain a good
model, although practically there are many screening assays where biological activity is measured only by
a rating scale. Hence, rating classification-oriented CoMFA coupled with ordinal logistic regression has
been developed, and its predictive ability and 3D graphical analysis ability have been investigated. As a
result, this novel CoMFA (Logistic CoMFA) has been found to be more robust than conventional CoMFAs
in both predictive and 3D graphical analysis abilities. Furthermore, Logistic CoMFA is useful since it can
provide the probability of each rank.
INTRODUCTION
A detailed understanding of the quantitative structure-
activity relationship (QSAR) is one of the principal goals of
medicinal chemistry. To be able to clarify the relationship
between chemical structure and biological activity is very
important, particularly in the hit-to-lead stage of drug
discovery. Researchers need to identify various properties
of a large number of compounds in a limited period of the
hit-to-lead stage. The growing need for early ADMET
1-3
increases the number of biological assays, such as Caco-2
cell permeability, CYP families inhibition, and hERG
blockade, per compound. Unfortunately there are more
experimental errors in screening data in the early screening
stage of drug discovery than in reliable assays employed in
the late stage. Since QSAR analysis generally makes use of
IC
50
and pK
i
values as the indices of biological response,
non-negligible differences between experimental and true
IC
50
/pK
i
values can be found in some screening assays.
4,5
In
addition, there are many in vivo assays where biological
activity is measured only by a rating scale. These circum-
stances make it difficult to build a good QSAR model.
Prediction of activity rating, in which the potency of a
compound is rather roughly assigned, enables us to quanti-
tatively analyze the data set, which has not been able to be
quantitatively analyzed because of noise. Treatment of a
couple of data would be necessary to determine the rating
classification since the ratings are not expressed in a metric
scale.
Several studies on the application of the rating classifica-
tion to classical QSAR have been performed. Martin et al.
6
conducted a classical QSAR analysis of monoamine oxidase
inhibition by using a rating scale with linear discriminant
analysis (LDA). Dunn, W. J., III et al.
7
analyzed the QSAR
of -adrenergic agents with the SIMCA (Soft Independent
Modelling of Class Analogy) method,
8
which is based on a
pattern recognition technique. LDA and SIMCA methods
are not considered to be suitable for rating classification
because both are introduced under the assumption that classes
are independent. To harness the characteristics of ordinal
classes, Takahashi et al.
9
developed ORMUCS (ORdered
Multicategorical Classification using Simplex optimization
technique). ORMUCS is also a pattern recognition method
that determines a discriminant function using a simplex
optimization. Apart from these methods, it is possible to
apply the ordinal logistic regression method (OLR) to QSAR
analysis. OLR is considered a statistical method that uses
the probability of each rating for classification. In fact, OLR
is one of the most popular methods used in social psycho-
logical studies and is more often applied to clinical data.
10,11
Comparative Molecular Field Analysis (CoMFA) has
become one of the most widely used 3-dimensional QSAR
(3D-QSAR) methods
12-14
since it was introduced by Cramer
et al.
15
to identify the relationship between 3-dimensional
molecular structure and biological activity. Prevalence of
commercial cheminformatics tools, such as Sybyl and high-
performance CPU, makes it convenient to use 3D-QSAR
analysis. However, the rating classification-oriented 3D-
QSAR method has not yet been developed. Considering the
prevalence of 3D-QSAR, it is desirable to classify rating with
a 3D-QSAR method. 3D-QSAR analysis with SIMCA has
not been used for rating classification and, unfortunately,
has been limited to dichotomous (active/inactive) analysis
16
or selectivity analysis.
17
In this study, we present the development and applicability
of a novel rating classification-oriented CoMFA with OLR
and compare it to conventional CoMFA analysis using 2 data
sets. One data set is the corticosteroid binding globulin
* Corresponding author e-mail: satan@gen-info.osaka-u.ac.jp.
†
Graduate School of Pharmaceutical Sciences, Osaka University.
‡
Tanabe Seiyaku Co., Ltd.
§
Osaka Ohtani University.
|
Research Institute for Microbial Diseases, Osaka University.
207 J. Chem. Inf. Model. 2008, 48, 207-212
10.1021/ci700238k CCC: $40.75 © 2008 American Chemical Society
Published on Web 12/28/2007