Determination of Abraham Solute Parameters from Molecular Structure Jesu ´s Jover, Ramo ´n Bosque, and Joaquim Sales* Departament de Quı ´mica Inorga `nica, Universitat de Barcelona, Martı ´ i Franque `s, 1, 08028-Barcelona, Spain Received February 6, 2004 The Abraham solute parameters are well-known factors for the quantitative description of solute/solvent interactions. A quantitative structure-property relationship (QSPR) is reported for the E, S, A, and B parameters of a large set of 457 solutes, of very different chemical nature. The proposed models, derived from multilinear regression analysis (MLRA) and computational neural networks (CNN), contain five descriptors calculated solely from the molecular structure of compounds. Good correlations were obtained for the four parameters studied, and the corresponding values of R 2 and standard deviations are better or similar than those derived from other theoretical bases. All models were validated by external prediction sets. The proposed QSPR models, both by MLRA and CNN, contain analogous descriptors encoding similar information, that agree with the accepted physicochemical meaning of the Abraham parameters; however, some descriptors which encode information that is not associated with this physicochemical meaning are also included in the QSPR models. INTRODUCTION The solute/solvent interactions are of major importance in chemistry and biochemistry according to the fact that, mainly, chemical reactions occur in solution. One of the most used methods to study these interactions is through empirical equations that relate a selected property with parameters of solutes and/or solvents. Abraham and co-workers 1 have proposed the general solvation equation to correlate solute properties (SP), such as partitioning, 2,3 solubility, 4 characterization of the selectivity of micellar electrokinetic chromatography systems, 5 blood-brain distribu- tion, 6 and human intestinal absorption, 7 with a standard set of five parameters. In this equation, E is an excess molar refraction that is obtained from the refractive index. S is the dipolarity/polarizability that can be obtained from gas-liquid chromatographic measurements on polar stationary phases or more generally from water/solvent partitions. The param- eters A and B are the overall or effective hydrogen bond acidity and basicity, respectively, which are most easily obtained from water-solvent partitions. V is the McGowan characteristic volume that can promptly be calculated from bond and atom contributions. These parameters represent the solute influence on various solute/solvent phase interactions. Hence, the coefficients c, e, s, a, b, and V, which are obtained via multiple linear regression against known log SP values, correspond to the complimentary effect on the phases on these interactions. The coefficients can be regarded as system constants which characterize and contain chemical information of the phase in question and can be interpreted as follows. The e- coefficient shows the tendency of the phase to interact with solutes through π and n-electron pairs. Usually the e- coefficient is positive, but for a phase which contains fluorine atoms, it can be negative. The s-coefficient represents the tendency of the phase to interact with dipolar/polarizable solutes. The a-coefficient denotes the hydrogen bond basicity of the phase, because acidic solutes will interact with basic phases, and the b-coefficient is a measure of the hydrogen bond acidity of the phase. The V-coefficient is a measure of the hydrophobicity of the phase, and it describes the dispersion interactions and cavitation forces. Any application of the general solvation equation depends on the availability of the solute parameters, and the need to calculate them for new compounds will always be of primary importance. As explained earlier, the descriptors E and V can be calculated quite simply from structure, but the remaining three descriptors S, A, and B have to be determined experimentally, either directly from complexation measure- ments or indirectly via back-calculations from partition measurements. Then, it is not surprising that different attempts have been made to avoid the obtention of experi- mental data for the determination of new S, A, and B values. Such attempts include the work of Sevcik and co-workers 8 who have reported multilinear regression and neural network approaches to estimate the S parameter from a set of 333 compounds using 29 molecular descriptors. Platts et al. using ab initio and DFT methods have estimated S, 9 A, 10 and B 11 Abraham parameters for sets of 50-80 compounds. More recently, the same authors have also applied DFT methods to the estimation of A and B parameters for multifunctional acids and bases. 12 On the other hand, an additive model for the estimation of the five solute parameters E, S, A, B, and V has also been proposed. 13 This model was developed from a set of 81 atom and functional group fragments and intramolecular interactions for which an evaluation of their contribution to each parameter was carried out through a process of multiple linear regressions. The method gives good results for predicting parameters, but as with all group * Corresponding author fax: +34934907725; e-mail: joaquim.sales@ qi.ub.es. log SP ) c + eE + sS + aA + bB +VV 1098 J. Chem. Inf. Comput. Sci. 2004, 44, 1098-1106 10.1021/ci049943w CCC: $27.50 © 2004 American Chemical Society Published on Web 04/17/2004