Forecasting Solar Power Generation on the basis of Predictive and Corrective Maintenance Activities Soham Vyas Department of Computer Science & Engineering, PDEU, Gandhinagar, India, Soham.vmtds21@sot.pdpu.ac.in Sanskar Bhuwania Department of Computer Science & Engineering, PDEU, Gandhinagar, sanskar.bce18@sot.pdpu.ac.in Brijesh Tripathi Department of Solar Energy, PDEU, Gandhinagar, India, brijesh.tripathi@sot.pdpu.ac.in Yuvraj Goyal Department of Computer Science & Engineering, PDEU, Gandhinagar, India, Yuvraj.gmtds21@sot.pdpu.ac.in Hardik Patel Department of Information and Communication Technology, PDEU, Gandhinagar, India, Hardik.patel@sot.pdpu.ac.in Neel Bhatt Department of Computer Science & Engineering, PDEU, Gandhinagar, India, Neel.bmtds20@sot.pdpu.ac.in Shakti Mishra Department of Computer Science & Engineering, PDEU, Gandhinagar, India, Shakti.mishra@sot.pdpu.ac.in Abstract—Solar energy forecasting has seen tremendous growth in the last decade using historical time series collected from a weather station, such as weather variables wind speed and direction, solar irradiance, and temperature. It helps in the overall management of solar power plants. However, the solar power plant regularly requires preventive and corrective maintenance activities that further impact energy production. This paper presents a novel work for forecasting solar power energy production based on maintenance activities, problems observed at a power plant, and weather data. The results accomplished on the dataset obtained from the 1MW solar power plant of PDEU (our university) that has generated data set with 13 columns as daily entries from 2012 to 2020. There are 12 structured columns and one unstructured column with manual text entries about different maintenance activities, problems observed, and weather conditions daily. The unstructured column is used to create a new feature column vector using Hash Map, flag words, and stop words. The final dataset comprises five important feature vector columns based on correlation and causality analysis. Further, the random forest regression is used to compute the impact of maintenance activities on the total energy output. The causality and correlation analysis has shown that the five feature vectors are interdependent time series variables. Next, Vector Autoregression (VAR) is chosen for simultaneous forecasting of total power generation for 3, 5, 7, 10, 12, and 30 days ahead using the VAR model. The results have shown that the root means square percentage error (RMSPE) in total power generation forecasting is less than 10% for different days. This research has proven that the spikes in total power generation forecasting can be traced and tracked better using daily maintenance activities, observed problems, and weather conditions. Keywords—forecasting, vector autoregression, maintenance activities, solar power generation, weather conditions I. INTRODUCTION Solar power generation has the potential to mitigate climate change by reducing the carbon footprint. It has had better market penetration in recent years because of awareness about clean and green energy and its affordable cost. Solar power plants require various planned and unplanned maintenance activities for better energy output. These maintenance activities include PV module cleaning and maintenance, PV module positioning in the field, inverter maintenance, etc. Solar energy forecasting is usually done using past time series data acquired from weather stations such as wind pressure, humidity, temperature, satellite imagery, etc. In this research work, total solar power generation forecasting is proposed by using different maintenance activities, problems observed, and weather data. Next, we have carried out a literature survey to understand the contemporary work done in this area. Fuzzy logic, AI models, and genetic algorithms are used to predict and model solar radiation, seizing, performances, and controls of the solar photovoltaic (PV) systems in [1]. Ensemble of deep ConvNets is proposed for multistep solar forecasting without additional time series models like RNN or LSTM and exogenous variables in [2] with 22.5% RMSE. Mycielski-Markov is utilized to forecast solar power generation for a short period in [3] with 32.65% RMSE. Feedforward neural network-based solar irradiance prediction is followed by LSTM-based solar power generation prediction for a short period [4] with 98.70 average RMSE. The ensemble approach is proposed based on long short-term memory (LSTM), gated recurrent unit (GRU), Autoencoder LSTM (Auto-LSTM), and Auto-GRU for solar power generation forecasting in [5] without considering any maintenance activities. Generic fault/status prediction and specific fault prediction by unsupervised clustering and neural network by using data of 10MW solar power plant and one hundred inverters of three different technology brands [6]. This model can predict generic faults up to 7 days in advance with 95% sensitivity and specific defects before some hours to 7 days [6]. Intra hour, short