Machine Learning Algorithms in Forecasting of Photovoltaic Power Generation Di Su, Efstratios Batzelis, Bikash Pal Electrical and Electronic Engineering Imperial College London London, UK {di.su17, e.batzelis, b.pal}@imperial.ac.uk Abstract — Due to the intrinsic intermittency and stochastic nature of solar power, accurate forecasting of the photovoltaic (PV) generation is crucial for the operation and planning of PV- intensive power systems. Several PV forecasting methods based on machine learning algorithms have recently emerged, but a complete assessment of their performance on a common framework is still missing from the literature. In this paper, a comprehensive comparative analysis is performed, evaluating ten recent neural networks and intelligent algorithms of the literature in short-term PV forecasting. All methods are properly fine-tuned and assessed on a one-year dataset of a 406 MWp PV plant in the UK. Furthermore, a new hybrid prediction strategy is proposed and evaluated, derived as an aggregation of the most well-performing forecasting models. Simulation results in MATLAB show that the season of the year affects the accuracy of all methods, the proposed hybrid one performing most favorably overall. Keywords—Forecasting, photovoltaic, machine learning, neural networks, intelligent algorithms. I. INTRODUCTION The UK targets for very high photovoltaic (PV) integration into the power system necessitates reliable forecasting of the stochastic and highly uncertain PV power generation. This is important for the power system stability and for keeping the PV power curtailments low. Recently, machine learning algorithms have emerged as powerful tools in predicting the PV power generation, as they avoid modelling of complex atmospheric phenomena but focus on the actual operation data. Artificial Neural Networks (ANN) are widely used in this context; some of the recent forecasting methods are discussed in the following. A Back-Propagation Neural Network (BPNN) is adopted in [1] for 24 hours ahead solar power forecasting, while the study in [2] explores a Non-linear Auto Regressive Neural Network with Exogenous Inputs (NARXNN) to predict the PV generation power at a standalone micro grid on a remote island. The authors of [3] achieve a 72-hour ahead PV power forecasting using an Elman Neural Network (ENN) and [4] presents a Generalized Regression Neural Network (GRNN) combined with Wavelet Transform (WT) for short-term PV power forecasting. A Fuzzy Neural Network (FNN) for PV power estimation is proposed in [5]. Another large class of solar power forecasting methods are based on Intelligent Algorithms (IA). Extreme Learning Machine (ELM) is used in [6] to predict the PV power output in multiple steps ahead, while a Random Forest (RF) model is adopted in [7] for day-ahead hourly PV power forecasting. The study in [8] estimates the PV power output of a 1 MW plant based on Support Vector Regression (SVR) and investigates the effect of cloudiness on the forecasting performance. SVR is also employed in [9], proposing a selection method of the SVR’s parameters for minimum estimation error. A comparison of the K-Nearest-Neighbours (KNN) and SVR methods on actual measurements and Numerical Weather Prediction (NWP) data is given in [10]; a feature extraction is attempted, resulting in the ten best features to be used as the model’s inputs. A literature review reveals that the machine learning approaches are generally superior to the conventional statistical methods due to their inherent ability to model any non-linear, complex and dynamic process. However, training of ANN or IA is complicated and there is still no commonly accepted way to construct the perfect model; this is why selecting and optimizing the model’ parameters is usually a trial and error process. Most of the relevant studies in the literature examine only a few machine learning methodologies, focusing on short- term (up to three days ahead) forecasting and not providing sufficient details on how the model’s parameters are found; a comprehensive comparative analysis to account for all relevant methods and longer look-ahead times is still missing from the literature. Furthermore, the various studies consider different real-world installations with dissimilar plant specifications, locations, time periods, weather conditions and datasets, while there is no consistent way to select the model training variables and error metrics. To this day, these methods have not been assessed on a common evaluation framework simultaneously. In this paper, ten different machine learning algorithms for six-day ahead PV power forecasting are implemented and compared; these include six ANN and four IA methods. A brief discussion is provided on the parameters tuning and performance evaluation for each method. Furthermore, a new hybrid prediction strategy is proposed, based on some of the most well-performing models, and is included in the comparison to evaluate its effectiveness. All simulations are curried out in MATLAB, using a dataset of one-year hourly measurements from a 406 MW PV park in the UK. This is the first study in the literature to perform such an assessment and performance comparison on a common evaluation framework and for medium-term horizons of six days. The rest of the paper is organized as follows. The dataset used is described in Section II, while the ten forecasting methods and the proposed hybrid approach are presented in Section III. The overall performance is discussed in Section IV, followed by the conclusions in Section V. II. CASE STUDY AND DATASET A. Plant Specifications The selected PV power plant has an installed capacity of 406 MWp and is connected to the Norwich Main Substation (Norfolk, England, UK). As shown in Fig. 1, this plant has a favourable position in terms of solar radiation and can generate more electrical power than the majority of other PV stations in the UK. The original training dataset is jointly provided by Sheffield Solar [11], Copernicus Atmosphere Monitoring Service (CAMS) [12] and MERRA-2 [13].