Contents lists available at ScienceDirect
Energy Conversion and Management
journal homepage: www.elsevier.com/locate/enconman
Comparison of Support Vector Machine and Extreme Gradient Boosting for
predicting daily global solar radiation using temperature and precipitation
in humid subtropical climates: A case study in China
Junliang Fan
a
, Xiukang Wang
b
, Lifeng Wu
c,
⁎
, Hanmi Zhou
d
, Fucang Zhang
a,e
, Xiang Yu
f
,
Xianghui Lu
c
, Youzhen Xiang
a
a
Institute of Water-saving Agriculture in Arid Areas of China, Northwest A&F University, Yangling 712100, China
b
College of Life Sciences, Yan’an University, Yan’an 716000, China
c
School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China
d
College of Agricultural Engineering, Henan University of Science and Technology, Luoyang 471003, China
e
Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling 712100, China
f
Provincial Key Laboratory for Water Information Cooperative Sensing and Intelligent Processing, Nanchang Institute of Technology, Nanchang 330099, China
ARTICLE INFO
Keywords:
Global solar radiation
Support Vector Machine
Extreme Gradient Boosting
Temperature
Precipitation
ABSTRACT
The knowledge of global solar radiation (H) is a prerequisite for the use of renewable solar energy, but H
measurements are always not available due to high costs and technical complexities. The present study proposes
two machine learning algorithms, i.e. Support Vector Machine (SVM) and a novel simple tree-based ensemble
method named Extreme Gradient Boosting (XGBoost), for accurate prediction of daily H using limited meteor-
ological data. Daily H, maximum and minimum air temperatures (T
max
and T
min
), transformed precipitation (P
t
,
1 for rainfall > 0 and 0 for rainfall = 0) and extra-terrestrial solar radiation (H
0
) during 1966–2000 and
2001–2015 from three radiation stations in humid subtropical China were used to train and test the models,
respectively. Two combinations of input parameters, i.e. (i) only T
max
,T
min
and R
a
, and (ii) complete data were
considered for simulations. The proposed machine learning models were also compared with four well-known
empirical models to evaluate their performances. The results suggest that the SVM and XGBoost models out-
performed the selected empirical models. The performance of the machine learning models was improved by
5.9–12.2% for training phase and by 8.0–11.5% for testing phase in terms of RMSE when information of pre-
cipitation was further included. Compared with the SVM model, the XGBoost model generally showed better
performance for training phase, and slightly weaker but comparable performance for testing phase in terms of
accuracy. However, the XGBoost model was more stable with average increase of 6.3% in RMSE, compared to
10.5% for the SVM algorithm. Also, the XGBoost model (3.02 s and 0.05 s for training and testing phase, re-
spectively) showed much higher computation speed than the SVM model (27.48 s and 4.13 s for training and
testing phase, respectively). By jointly considering the prediction accuracy, model stability and computational
efficiency, the XGBoost model is highly recommended to estimate daily H using commonly available tempera-
ture and precipitation data with excellent performance in humid subtropical climates.
1. Introduction
Accurate estimation of global solar radiation (H) is of great im-
portance for the design and optimization of solar energy systems
[61,57,54,34]. However, unlike other meteorological data (e.g. tem-
perature and precipitation), measurements of global solar radiation are
always not available for many worldwide locations owing to the high
costs and technical complexities [33]. Therefore, various approaches
have been proposed to predict H where lack of global solar radiation
data, e.g. empirical models [12,35,1], artificial intelligence-based
models [15,17,52] and satellite-based methods [36,67,8], etc. Among
the above methods, empirical and intelligence-based models are most
commonly used due to their model simplicity and high prediction ac-
curacy, respectively [61,30,33,24].
Over the past few decades, many efforts have been made to predict
H from different types of empirical models, e.g. sunshine-based models
[4,9,10], cloudiness-based models [27,7,39], temperature-based
models [44,68,30], day number-based models [41,37,53] and hybrid
https://doi.org/10.1016/j.enconman.2018.02.087
Received 4 January 2018; Received in revised form 25 February 2018; Accepted 26 February 2018
⁎
Corresponding author.
E-mail address: china.sw@163.com (L. Wu).
Energy Conversion and Management 164 (2018) 102–111
0196-8904/ © 2018 Elsevier Ltd. All rights reserved.
T