Prediction of Henry’s Law Constant of Organic Compounds in Water from a New Group-Contribution-Based Model Farhad Gharagheizi,* ,†,‡ Reza Abbasi, ‡ and Behnam Tirandazi § Department of Chemical Engineering, Faculty of Engineering, UniVersity of Tehran, P.O. Box 11365-4563, Tehran, Iran, Saman Energy Giti Co., Postal Code 3331619636, Tehran, Iran, and Department of Chemical Engineering, Iran UniVersity of Science and Technology, Tehran, Iran In this work, a new model is presented for estimation of Henry’s law constant of pure compounds in water at 25 °C(H). This model is based on a combination between a group contribution method and neural networks. The needed parameters of the model are the occurrences of a new collection of 107 functional groups. On the basis of these 107 functional groups, a feed forward neural network is presented to estimate the H of pure compounds. The squared correlation coefficient, absolute percent error, standard deviation error, and root- mean-square error of the model over a diverse set of 1940 pure compounds used are, respectively, 0.9981, 2.84%, 2.4, and 0.1 (all the values obtained using log H based data). Therefore, the model is a comprehensive and an accurate model and can be used to predict the H of a wide range of chemical families of pure compounds in water better than previously presented models. Introduction One of the key processes affecting the fates of many organic compounds in environment is the transfer of chemicals between air and aqueous phases. 1 One of the most important parameters applied for this purpose is the Henry’s law constant for compounds in water denoted by H. 1,2 The H is usually referred to as the ratio of chemical’s concentration in water to its concentration in air, so reliable data for H are needed to track the fates of chemicals in environment. Generally, accurate measurements of the H are difficult and expensive due to the adsorption of minute amounts of solute on the wall of the apparatus and the analytical detection limits of the low concentrations of very hydrophobic compounds. 3,4 Therefore an accurate estimation method for the H is of great importance. A number of methods have been presented to directly estimate the H of pure compounds in water from chemical structure. It should be noted that there are some indirect methods for estimation of H from other vapor-liquid equilibrium data such as activity coefficient, 5,6 but application of those methods for estimation of H is not exactly evaluated, so in the present work we focus on those methods which directly estimate the H of pure compounds from their chemical structure. These correla- tions can be classified into two main classes based on the type of parameters they use. The class-1 includes those correlations which use other physical properties such as vapor pressure and aqueous solubility of the compound for estimation of the H. The most well-known method of this class is the correlation presented by Mackay et al. 7 These correlations have some important disadvantages. The accuracy in these correlations is directly related to the accuracy in the needed physical properties or methods used to estimate those physical properties. Furthermore, if only one of the needed properties is missed, no calculation can be performed to estimate the H. The class-2 contains those correlations called quantitative structure property relationships (QSPR) which use only molec- ular-based parameters to predict the H. The most well-known correlations of this class are those correlations presented by Hine and Mookerjee, 8 Meylan and Howard, 9,10 Abraham et al., 11 Katritzky et al., 12 Dearden et al., 13-15 English and Carrol, 16 Yao et al., 17 Lin and Sandler, 18 and Yafe et al. 19 The most important disadvantage of the majority of these correlations is their complex procedure for computations of molecular-based parameters, so the majority of these correlations are not usually simple to apply. It seems the simplest type of the correlations of this class is those correlations called group contribution methods (GC). In this type of methods, numbers of occurrences of several functional groups are used to estimate various physical properties. In this study, a new comprehensive model is presented to estimate H of pure compounds based on a combination between the application of a new collection of functional groups as parameters of the model and the application of neural networks to develop the model. Materials and Methods Materials. The comprehensiveness of a molecular-based model is directly related to the comprehensiveness of the data set of compounds applied to its development. This comprehen- siveness includes both diversity in chemical families used and the number of compounds available in the data set. Our literature survey showed one of the most comprehensive data sets presented for H of pure compounds is the compilation provided by Yaws, 20 so 1940 pure compounds found in the handbook and their H values were extracted and used as main data set in this work. It should be noted that the H values were compiled in the units of atm · m 3 · mol -1 (mol fraction basis) and presented as a decimal logarithm of H at 25 °C. The values range from -13.461 to 6.238. On the basis of our literature survey, this database is the most comprehensive data set that has ever been used for developing a model for the prediction of the H of pure compounds in water. This data set is presented as Supporting Information. Developing New Group Contributions. After providing the data set, the chemical structures of all 1940 compounds were * To whom correspondence should be addressed. Fax: +98 21 77926580. E-mail: fghara@ut.ac.ir; fghara@gmail.com. † University of Tehran. ‡ Saman Energy Giti Co. § Iran University of Science and Technology. Ind. Eng. Chem. Res. 2010, 49, 10149–10152 10149 10.1021/ie101532e 2010 American Chemical Society Published on Web 09/21/2010