Application of ALOGPS 2.1 to Predict
log D Distribution Coefficient for Pfizer
Proprietary Compounds
Igor V. Tetko*
,
and Gennadiy I. Poda
‡
Biomedical Department, Institute of Bioorganic and
Petroleum Chemistry, Ukrainian Academy of Sciences,
Murmanskaya 1, Kyiv, 02094, Ukraine, and Structural and
Computational Chemistry, Pfizer Global Research and
Development, 700 Chesterfield Parkway West,
Chesterfield, Missouri 63017
Received June 22, 2004
Abstract: Evaluation of the ALOGPS, ACD Labs LogD, and
PALLAS PrologD suites to calculate the log D distribution
coefficient resulted in high root-mean-squared error (RMSE)
of 1.0-1.5 log for two in-house Pfizer’s log D data sets of 17 861
and 640 compounds. Inaccuracy in log P prediction was the
limiting factor for the overall log D estimation by these algo-
rithms. The self-learning feature of the ALOGPS (LIBRARY
mode) remarkably improved the accuracy in log D prediction,
and an rmse of 0.64-0.65 was calculated for both data sets.
Oral bioavailability of chemicals is a very important
pharmacokinetic parameter in drug development. To
reach the target enzyme in the human body, drugs have
to cross barriers by passive diffusion or carrier-mediated
uptake. The 1-octanol-water partition coefficient, log P,
is well-known as one of the principal parameters to
estimate lipophilicity (or solubility in lipids) of chemical
compounds and, to a large degree, determines their
pharmacokinetic properties. The log P is also used as
one of the standard properties identified by Lipinski in
the “rule of 5” for druglike molecules.
1
By definition
log P refers to neutral molecules. If a molecule contains
basic or acidic groups, it becomes ionized and its
distribution in octanol-water becomes pH-dependent.
The pH-dependent distribution coefficient, log D, was
shown to correlate with a number of biological param-
eters, such as the effective permeability in human
jejunum,
2
blood-brain barrier (BBB) permeability,
3
plasma protein binding,
4
CYP 450 oxidation,
5
and
volume of distribution (V
D
).
6,7
Oral drugs, to be able to
be absorbed by passive diffusion through the gut wall,
should have their lipophilicity within a given range
(usually between 1 and 4 on the log D scale).
Both coefficients log P and log D are very important
parameters in drug development,
8
and thus, there is a
need to develop new methods to accurately calculate
them from chemical structures. Currently, the amount
of publicly available experimental log P data comprises
tens of thousands of compounds.
9
These resources
stimulated development of a number of programs to
calculate it.
10-15
The problem of predicting log D is more
complicated. As a rule, it is computed from log P and
pK
a
assuming that only the neutral form partitions into
the organic phase as
12,16
where Δ
i
) {1, -1} for acids and bases, respectively.
If several groups can be ionized, the equation is
modified accordingly to incorporate correction terms for
all of them. Thus, the log D prediction potentially
accumulates errors due to the log P and pK
a
predictions.
Development of computational approaches is further
complicated because of the absence of publicly available
large data sets with experimental log D values. As a
result, only a few programs are available to estimate
the log D.
12
A recent evaluation of two commercial
programs calculated a root-mean-squared error (rmse)
of 1.4-1.9 log units for a data set of about 20 000
compounds
17
that is not accurate for practical usage.
Therefore, large pharmaceutical companies such as
Pfizer and AstraZeneca have established their own
techniques to experimentally determine log D for their
proprietary compounds.
The ALOGPS program
18-20
(http://www.vcclab.org)
was developed using the associative neural network
(ASNN) method.
21,22
The ASNN provides a possibility
to include new data into the memory of neural nets
without retraining the neural networks themselves in
the so-called LIBRARY mode (further LIBRARY).
19
The
LIBRARY dramatically improved prediction of the
ALOGPS program for the log P prediction using in-
house data sets from BASF,
21
Pfizer,
23
and Astra-
Zeneca.
24,25
The current study demonstrates that
the ALOGPS is also able to reliably predict the pH-
dependent distribution coefficient, log D.
The octanol-water partition data used in this study
was collected at two Pfizer sites and contributed to two
data sets. The first data set included 669 legacy Phar-
macia compounds with log D values measured by a
medium-throughput method using a nitrogen detector
(called the NlogD set). A typical experimental error in
log D measurements is about 0.3-0.5 log units. The
second data set (ElogD set) included 18 889 compounds
measured using the ElogD method.
26,27
An inspection
of compounds indicated that both sets were not overlap-
ping. For compounds that had multiple measurements
average values were used. Also, because the ALOGPS
method does not take into account stereoselectivity,
average values were used for stereoisomers. After
removal of structural duplicates and stereoisomers, the
numbers of compounds decreased to 640 and 17 861 for
NlogD and ElogD data sets, respectively.
For comparison, ACD Labs LogD v.7.19
28
and
PALLAS PrologD software
29
was used to calculate
log D values at pH 7.4 for ElogD and NlogD data sets.
The stand-alone graphical-based interface versions of
ALOGPS and ASNN were used to perform analysis of
compounds using three protocols.
In the first protocol, the ALOGPS program was used
“as is” to calculate a blind prediction of molecules from
each data set.
In the second protocol, the self-learning feature
implemented as a “LIBRARY” mode of ALOGPS 2.1 was
* To whom correspondence should be addressed. Address: Institute
for Bioinformatics GSF, Forschungszentrum fu ¨ r Umwelt und Gesund-
heit, GmbH, Ingolsta ¨ dter Landstrasse 1, D-85764 Neuherberg, Ger-
many. Phone: +49-89-3187-3575. Fax: +49-89-3187-3585. E-mail:
itetko@vcclab.org.
Institute of Bioorganic and Petroleum Chemistry.
‡
Pfizer Global Research and Development.
log D(pH) ) log P - log(1 + 10
(pH-pK
a
)∆
i
) (1)
5601 J. Med. Chem. 2004, 47, 5601-5604
10.1021/jm049509l CCC: $27.50 © 2004 American Chemical Society
Published on Web 10/05/2004