Accurate Prediction of Enthalpies of Formation for a Large Set of Organic Compounds CUN-XI LIU, 1 HAI-XIA WANG, 1 ZE-RONG LI, 1 CHONG-WEN ZHOU, 2 HAN-BING RAO, 1 XIANG-YUAN LI 2 1 College of Chemistry, Sichuan University, Chengdu 610065, People’s Republic of China 2 College of Chemical Engineering, Sichuan University, Chengdu 610065, People’s Republic of China Received 5 August 2009; Revised 21 February 2010; Accepted 25 February 2010 DOI 10.1002/jcc.21550 Published online 6 May 2010 in Wiley Online Library (wileyonlinelibrary.com). Abstract: This article describes a multiparameter calibration model, which improves the accuracy of density func- tional theory (DFT) for the prediction of standard enthalpies of formation for a large set of organic compounds. The model applies atom based, bond based, electronic, and radical environmental correction terms to calibrate the calculated enthalpies of formation at B3LYP/6-31G(d,p) level by a least-square method. A diverse data set of 771 closed-shell compounds and radicals is used to train the model. The leave-one-out cross validation squared correla- tion coefficient q 2 of 0.84 and squared correlation coefficient r 2 of 0.86 for the final model are obtained. The mean absolute error in enthalpies of formation for the dataset is reduced from 4.9 kcal/mol before calibration to 2.1 kcal/mol after calibration. Five-fold cross validation is also used to estimate the performance of the calibration model and similar results are obtained. q 2010 Wiley Periodicals, Inc. J Comput Chem 31: 2585–2592, 2010 Key words: enthalpy of formation; organic compounds; DFT; least-square Introduction The accurate prediction of molecular thermochemical properties is one of the goals (vital tasks) in quantum chemical methods, especially the enthalpy of formation (D f H 0 298 ). A series of com- posite methods, such as Gaussian-n (n 5 1 – 4) theories 1–8 and complete basis set methods (CBS) of Petersson and coworkers 9–12 (e.g., CBS-Q, CBS-QB3, and CBS-APNO), have been successfully used for the calculations of the enthalpy of formation. The Gn theories employ a set of calculations with different levels of accuracy and basis sets with the goal of approaching the exact energy. In the most recent G4 scheme, 8 the mean absolute deviation (MAD) from experimental enthal- pies of formation over G3/05 test set 7 (contains 270 molecules whose experimental D f H 0 298 are accurately known) is 0.80 kcal/ mol within chemical accuracy, which makes a significant improvement over G3 theory (1.19 kcal/mol). In addition to the Gaussian-n methods and CBS procedure, many model chemistry methods have been developed for accurate calculation of ther- mochemical properties of the compounds, such as the correlation consistent composite approch (ccCA) proposed by Deyonker et. al., 13 which contains no semiempirical or optimized parame- ters, the focal point method by Allen and coworkers, 14 the Weiz- mann (Wn) family of methods of Martin and coworkers, 15–18 and the High Accuracy Extrapolated ab initio Thermochemistry (HEAT) method by Stanton and coworkers, 19–21 the multicoeffi- cient correlation method (MCCM) developed by Truhlar and coworkers. 22–24 Detailed discussion on the performance of these methods is beyond the scope of this article, but all these meth- ods exhibit or exceed chamical accuracy of \1 kcal/mol. An alternative and accurate approach for calculation of thermochem- ical data is based on the coupled-cluster (CC) scheme with sin- gle and double excitation augmented by a perturbative treatment of triple excitations (CCSD(T)) 25 and full coupled-cluster sin- gles, doubles, and triples method ( CCSDT), 26 by employing the Dunning Correlation consistent basis sets (through aug-cc-pVDZ to aug-cc-pV5Z), 27 but, application of CC methods to larger chemical systems is limited by the rapidly increasing computa- tional effort with growing number of electrons and basis func- tions. Nowadays, efficient implementations allow calculations at the CCSD(T) level of theory with up to 800 basis functions. 28 In the work of Dixon and coworkers, 29 the largest calculation per- formed was the CCSD(T) calculation on octane with 1468 basis Additional Supporting Information may be found in the online version of this article. Correspondence to: Z.-R. Li; e-mail: lizerong@scu.edu.cn Contract/grant sponsor: National Natural Science Foundation of China; contract/grant number: 20973118 q 2010 Wiley Periodicals, Inc.