Determination of Abraham Solute Parameters from Molecular Structure
Jesu ´s Jover, Ramo ´n Bosque, and Joaquim Sales*
Departament de Quı ´mica Inorga `nica, Universitat de Barcelona, Martı ´ i Franque `s, 1, 08028-Barcelona, Spain
Received February 6, 2004
The Abraham solute parameters are well-known factors for the quantitative description of solute/solvent
interactions. A quantitative structure-property relationship (QSPR) is reported for the E, S, A, and B
parameters of a large set of 457 solutes, of very different chemical nature. The proposed models, derived
from multilinear regression analysis (MLRA) and computational neural networks (CNN), contain five
descriptors calculated solely from the molecular structure of compounds. Good correlations were obtained
for the four parameters studied, and the corresponding values of R
2
and standard deviations are better or
similar than those derived from other theoretical bases. All models were validated by external prediction
sets. The proposed QSPR models, both by MLRA and CNN, contain analogous descriptors encoding similar
information, that agree with the accepted physicochemical meaning of the Abraham parameters; however,
some descriptors which encode information that is not associated with this physicochemical meaning are
also included in the QSPR models.
INTRODUCTION
The solute/solvent interactions are of major importance
in chemistry and biochemistry according to the fact that,
mainly, chemical reactions occur in solution. One of the most
used methods to study these interactions is through empirical
equations that relate a selected property with parameters of
solutes and/or solvents.
Abraham and co-workers
1
have proposed the general
solvation equation
to correlate solute properties (SP), such as partitioning,
2,3
solubility,
4
characterization of the selectivity of micellar
electrokinetic chromatography systems,
5
blood-brain distribu-
tion,
6
and human intestinal absorption,
7
with a standard set
of five parameters. In this equation, E is an excess molar
refraction that is obtained from the refractive index. S is the
dipolarity/polarizability that can be obtained from gas-liquid
chromatographic measurements on polar stationary phases
or more generally from water/solvent partitions. The param-
eters A and B are the overall or effective hydrogen bond
acidity and basicity, respectively, which are most easily
obtained from water-solvent partitions. V is the McGowan
characteristic volume that can promptly be calculated from
bond and atom contributions.
These parameters represent the solute influence on various
solute/solvent phase interactions. Hence, the coefficients c,
e, s, a, b, and V, which are obtained via multiple linear
regression against known log SP values, correspond to the
complimentary effect on the phases on these interactions.
The coefficients can be regarded as system constants which
characterize and contain chemical information of the phase
in question and can be interpreted as follows. The e-
coefficient shows the tendency of the phase to interact with
solutes through π and n-electron pairs. Usually the e-
coefficient is positive, but for a phase which contains fluorine
atoms, it can be negative. The s-coefficient represents the
tendency of the phase to interact with dipolar/polarizable
solutes. The a-coefficient denotes the hydrogen bond basicity
of the phase, because acidic solutes will interact with basic
phases, and the b-coefficient is a measure of the hydrogen
bond acidity of the phase. The V-coefficient is a measure of
the hydrophobicity of the phase, and it describes the
dispersion interactions and cavitation forces.
Any application of the general solvation equation depends
on the availability of the solute parameters, and the need to
calculate them for new compounds will always be of primary
importance. As explained earlier, the descriptors E and V
can be calculated quite simply from structure, but the
remaining three descriptors S, A, and B have to be determined
experimentally, either directly from complexation measure-
ments or indirectly via back-calculations from partition
measurements. Then, it is not surprising that different
attempts have been made to avoid the obtention of experi-
mental data for the determination of new S, A, and B values.
Such attempts include the work of Sevcik and co-workers
8
who have reported multilinear regression and neural network
approaches to estimate the S parameter from a set of 333
compounds using 29 molecular descriptors. Platts et al. using
ab initio and DFT methods have estimated S,
9
A,
10
and B
11
Abraham parameters for sets of 50-80 compounds. More
recently, the same authors have also applied DFT methods
to the estimation of A and B parameters for multifunctional
acids and bases.
12
On the other hand, an additive model for
the estimation of the five solute parameters E, S, A, B, and
V has also been proposed.
13
This model was developed from
a set of 81 atom and functional group fragments and
intramolecular interactions for which an evaluation of their
contribution to each parameter was carried out through a
process of multiple linear regressions. The method gives good
results for predicting parameters, but as with all group
* Corresponding author fax: +34934907725; e-mail: joaquim.sales@
qi.ub.es.
log SP ) c + eE + sS + aA + bB +VV
1098 J. Chem. Inf. Comput. Sci. 2004, 44, 1098-1106
10.1021/ci049943w CCC: $27.50 © 2004 American Chemical Society
Published on Web 04/17/2004