Benchmarking of Linear and Nonlinear Approaches for Quantitative
Structure-Property Relationship Studies of Metal Complexation with Ionophores
Igor V. Tetko
²
Institute of Bioorganic & Petrochemistry, Kiev, Ukraine
Vitaly P. Solov’ev
Institute of Physical Chemistry, Russian Academy of Sciences, Leninskiy prospect 31a,
119991 Moscow, Russia
Alexey V. Antonov
Institute for Bioinformatics, Neuherberg D-85764, Germany
Xiaojun Yao, Jean Pierre Doucet, and Botao Fan
Universite ´ Paris 7-Denis Diderot, ITODYS-CNRS UMR 7086, 1, rue Guy de la Brosse, Paris 75005, France
Frank Hoonakker, Denis Fourches, Piere Jost, Nicolas Lachiche, and Alexandre Varnek*
Laboratoire d’Infochimie, UMR 7551 CNRS, Universite ´ Louis Pasteur,
4, rue B. Pascal, Strasbourg 67000, France
Received September 24, 2005
A benchmark of several popular methods, Associative Neural Networks (ANN), Support Vector Machines
(SVM), k Nearest Neighbors (kNN), Maximal Margin Linear Programming (MMLP), Radial Basis Function
Neural Network (RBFNN), and Multiple Linear Regression (MLR), is reported for quantitative-structure
property relationships (QSPR) of stability constants logK
1
for the 1:1 (M:L) and log
2
for 1:2 complexes of
metal cations Ag
+
and Eu
3+
with diverse sets of organic molecules in water at 298 K and ionic strength 0.1
M. The methods were tested on three types of descriptors: molecular descriptors including E-state values,
counts of atoms determined for E-state atom types, and substructural molecular fragments (SMF). Comparison
of the models was performed using a 5-fold external cross-validation procedure. Robust statistical tests
(bootstrap and Kolmogorov-Smirnov statistics) were employed to evaluate the significance of calculated
models. The Wilcoxon signed-rank test was used to compare the performance of methods. Individual
structure-complexation property models obtained with nonlinear methods demonstrated a significantly better
performance than the models built using multilinear regression analysis (MLRA). However, the averaging
of several MLRA models based on SMF descriptors provided as good of a prediction as the most efficient
nonlinear techniques. Support Vector Machines and Associative Neural Networks contributed in the largest
number of significant models. Models based on fragments (SMF descriptors and E-state counts) had higher
prediction ability than those based on E-state indices. The use of SMF descriptors and E-state counts provided
similar results, whereas E-state indices lead to less significant models. The current study illustrates the
difficulties of quantitative comparison of different methods: conclusions based only on one data set without
appropriate statistical tests could be wrong.
INTRODUCTION
An important branch of supramolecular chemistry is the
chemistry of ionophore-molecules possessing high affinity
toward metal cations in solutions. Their ability to bind cations
selectively is widely used in practice for the separation and
concentration of metals (solvent extraction) and in analytical
devices (ion-selective electrodes, CHEMFETs, etc.).
1
Experimental measurements of stability constants of iono-
phore-metal complexes and related free energies of com-
plexation reactions represent rather difficult and costly tasks.
That is why a theoretical quantitative estimation of complexes
stabilities might become an important complement of
experimental studies thus providing researchers a way to
reduce the number of experiments and to indicate the strategy
of “optimization” of known metal binders.
The thermodynamic complexation properties depend on
many parameters: the nature of the metal, structure of
ionophore, solvent, conterion(s), temperature, and back-
ground compounds. In experiments, even small inaccuracies
in measuring species concentration or temperature may lead
to errors in complexation constants up to several log units.
2,3
One can mention different theoretical approaches to assess
free energies of complexation. Quantum Mechanics calcula-
tions in the gas phase could be hardly recommended for these
* Corresponding author e-mail: varnek@chimie.u-strasbg.fr.
²
Current address: Institute for Bioinformatics, Neuherberg D-85764,
Germany. http://www.vcclab.org.
808 J. Chem. Inf. Model. 2006, 46, 808-819
10.1021/ci0504216 CCC: $33.50 © 2006 American Chemical Society
Published on Web 01/17/2006