Contents lists available at ScienceDirect Energy Conversion and Management journal homepage: www.elsevier.com/locate/enconman Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China Junliang Fan a , Xiukang Wang b , Lifeng Wu c, , Hanmi Zhou d , Fucang Zhang a,e , Xiang Yu f , Xianghui Lu c , Youzhen Xiang a a Institute of Water-saving Agriculture in Arid Areas of China, Northwest A&F University, Yangling 712100, China b College of Life Sciences, Yanan University, Yanan 716000, China c School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China d College of Agricultural Engineering, Henan University of Science and Technology, Luoyang 471003, China e Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling 712100, China f Provincial Key Laboratory for Water Information Cooperative Sensing and Intelligent Processing, Nanchang Institute of Technology, Nanchang 330099, China ARTICLE INFO Keywords: Global solar radiation Support Vector Machine Extreme Gradient Boosting Temperature Precipitation ABSTRACT The knowledge of global solar radiation (H) is a prerequisite for the use of renewable solar energy, but H measurements are always not available due to high costs and technical complexities. The present study proposes two machine learning algorithms, i.e. Support Vector Machine (SVM) and a novel simple tree-based ensemble method named Extreme Gradient Boosting (XGBoost), for accurate prediction of daily H using limited meteor- ological data. Daily H, maximum and minimum air temperatures (T max and T min ), transformed precipitation (P t , 1 for rainfall > 0 and 0 for rainfall = 0) and extra-terrestrial solar radiation (H 0 ) during 19662000 and 20012015 from three radiation stations in humid subtropical China were used to train and test the models, respectively. Two combinations of input parameters, i.e. (i) only T max ,T min and R a , and (ii) complete data were considered for simulations. The proposed machine learning models were also compared with four well-known empirical models to evaluate their performances. The results suggest that the SVM and XGBoost models out- performed the selected empirical models. The performance of the machine learning models was improved by 5.912.2% for training phase and by 8.011.5% for testing phase in terms of RMSE when information of pre- cipitation was further included. Compared with the SVM model, the XGBoost model generally showed better performance for training phase, and slightly weaker but comparable performance for testing phase in terms of accuracy. However, the XGBoost model was more stable with average increase of 6.3% in RMSE, compared to 10.5% for the SVM algorithm. Also, the XGBoost model (3.02 s and 0.05 s for training and testing phase, re- spectively) showed much higher computation speed than the SVM model (27.48 s and 4.13 s for training and testing phase, respectively). By jointly considering the prediction accuracy, model stability and computational eciency, the XGBoost model is highly recommended to estimate daily H using commonly available tempera- ture and precipitation data with excellent performance in humid subtropical climates. 1. Introduction Accurate estimation of global solar radiation (H) is of great im- portance for the design and optimization of solar energy systems [61,57,54,34]. However, unlike other meteorological data (e.g. tem- perature and precipitation), measurements of global solar radiation are always not available for many worldwide locations owing to the high costs and technical complexities [33]. Therefore, various approaches have been proposed to predict H where lack of global solar radiation data, e.g. empirical models [12,35,1], articial intelligence-based models [15,17,52] and satellite-based methods [36,67,8], etc. Among the above methods, empirical and intelligence-based models are most commonly used due to their model simplicity and high prediction ac- curacy, respectively [61,30,33,24]. Over the past few decades, many eorts have been made to predict H from dierent types of empirical models, e.g. sunshine-based models [4,9,10], cloudiness-based models [27,7,39], temperature-based models [44,68,30], day number-based models [41,37,53] and hybrid https://doi.org/10.1016/j.enconman.2018.02.087 Received 4 January 2018; Received in revised form 25 February 2018; Accepted 26 February 2018 Corresponding author. E-mail address: china.sw@163.com (L. Wu). Energy Conversion and Management 164 (2018) 102–111 0196-8904/ © 2018 Elsevier Ltd. All rights reserved. T