Exhaustive QSPR Studies of a Large Diverse Set of Ionic Liquids: How Accurately Can We Predict Melting Points? Alexandre Varnek* and Natalia Kireeva Laboratoire d’Infochimie, UMR 7551 CNRS, Universite ´ Louis Pasteur, 4, rue B. Pascal, Strasbourg 67000, France Igor V. Tetko GSF- Institute for Bioinformatics, Neuherberg D-85764, Germany, and Institute of Bioorganic & Petrochemistry, Kiev, Ukraine Igor I. Baskin Department of Chemistry, Moscow State University, Moscow 119992, Russia Vitaly P. Solov’ev Institute of Physical Chemistry, Russian Academy of Sciences, Leninskiy prospect 31a, Moscow 119992, Russia Received November 4, 2006 Several popular machine learning methodssAssociative Neural Networks (ANN), Support Vector Machines (SVM), k Nearest Neighbors (kNN), modified version of the partial least-squares analysis (PLSM), backpropagation neural network (BPNN), and Multiple Linear Regression Analysis (MLR)simplemented in ISIDA, NASAWIN, and VCCLAB software have been used to perform QSPR modeling of melting point of structurally diverse data set of 717 bromides of nitrogen-containing organic cations (FULL) including 126 pyridinium bromides (PYR), 384 imidazolium and benzoimidazolium bromides (IMZ), and 207 quaternary ammonium bromides (QUAT). Several types of descriptors were tested: E-state indices, counts of atoms determined for E-state atom types, molecular descriptors generated by the DRAGON program, and different types of substructural molecular fragments. Predictive ability of the models was analyzed using a 5-fold external cross-validation procedure in which every compound in the parent set was included in one of five test sets. Among the 16 types of developed structure - melting point models, nonlinear SVM, ASNN, and BPNN techniques demonstrate slightly better performance over other methods. For the full set, the accuracy of predictions does not significantly change as a function of the type of descriptors. For other sets, the performance of descriptors varies as a function of method and data set used. The root-mean squared error (RMSE) of prediction calculated on independent test sets is in the range of 37.5-46.4 °C(FULL), 26.2- 34.8 °C(PYR), 38.8-45.9 °C(IMZ), and 34.2-49.3 °C(QUAT). The moderate accuracy of predictions can be related to the quality of the experimental data used for obtaining the models as well as to difficulties to take into account the structural features of ionic liquids in the solid state (polymorphic effects, eutectics, glass formation). 1. INTRODUCTION Ionic liquids (IL) have received a great attention due to their green and tuneable properties. The negligible vapor pressures allow for their potential use as an alternative for organic volatile solvents. 1,2 Careful choice of cation/anion combination permits fabrication of IL with physical and chemical properties well fitted to a specific problem. One of the most important physical properties of IL, melting point (mp), was a subject of numerous studies (see book 3 and references therein). Melting point characterizing a passage from solid to liquid state has a very complex relationship with the structure of constituent ions because of many different factors. 4 Thus, both in solid and liquid phases, various types of interactions between ions should be taken into account: electrostatic and van der Waals interactions, hydrogen bonds, and aromatic π-π-stacking. The symmetry and conformational flexibility of individual species play an important role because they affect the crystal packing and, hence, melting points. Another problem is related to the phase content of the solids. Unlike high-melting salts, certain types of IL (i.e., halides of imidazolium cations 5 ) melt from eutectic mixtures of several crystalline polymorphs. Usually, the eutectic temperature is considerably lower than melting points of individual polymorphs. One should not also exclude formation of glasses instead of crystalline phases which is quite typical low-melting IL. 6 In this case, mp represents the glass transition temperature which is rather different from melting point of the corresponding crystalline state. * Corresponding author e-mail: varnek@chimie.u-strasbg.fr; http:// infochem.u-strasbg.fr. 1111 J. Chem. Inf. Model. 2007, 47, 1111-1122 10.1021/ci600493x CCC: $37.00 © 2007 American Chemical Society Published on Web 03/24/2007