1551-3203 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2019.2916566, IEEE Transactions on Industrial Informatics IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS 1 An Optimized Offline Random Forests-Based Model for Ultra-short-term Prediction of PV Characteristics Ibrahim Anwar Ibrahim, Member, IEEE, M. J. Hossain, Senior Member, IEEE, and Benjamin C. Duck Abstract—The fluctuation of meteorological data causes ran- dom changes in photovoltaic (PV) performance, which may negatively affect the stability and reliability of the electrical grid. This paper proposes a new ultra-short-term offline hybrid prediction model for photovoltaic I-V characteristic curves based on the dynamic characteristics of the meteorological data on a 15-min. basis. The proposed hybrid prediction model is a combination of the random forests (RFs) prediction technique and the ant-lion optimizer (ALO). ALO is used to optimize the hyper-parameters of the RFs model which aims to improve its performance in terms of accuracy and computational time. The performance of the proposed hybrid prediction model is compared with that of conventional RFs, RFs-iteration, gen- eralized regression neural network (GRNN), GRNN-iteration, GRNN-ALO, a cascade-forward neural network (CFNN), CFNN- iteration, CFNN-ALO, feed-forward neural network (FFNN), FFNN-iteration and FFNN-ALO models. The result shows that the I-V characteristic-curve prediction accuracy, in terms of the root-mean-squared error (RMSE), mean bias error (MBE) and mean absolute percentage error (MAPE) of the proposed model are 0.0091 A, 0.0028 A and 0.1392%, respectively, with an accuracy of 99.86%. Moreover, the optimization, training and testing times are 162.15 sec., 10.1919 sec. and 0.1237 sec., respectively. Therefore, the proposed model performs better than the aforementioned models and the other existing models in the literature. Accordingly, the proposed hybrid (RFs-ALO) offline model can significantly improve the accuracy of PV performance prediction, especially in grid-connected PV system applications. Index Terms—I-V curve, photovoltaic, prediction, random forests technique, ant-lion optimizer. NOMENCLATURE Abbreviations ALO Ant-Lion Optimizer AI Artificial Intelligence ANN Artificial Neural Network CFNN Cascade-Forward neural network CPU Central Processing Unit DBSCAN Density-based spatial clustering of applications with noise FFNN Feed-Forward Neural Network GRNN Generalized Regression Neural Network IB In-Bag I V Current-Voltage Manuscript received Month xx, 2xxx; revised Month xx, xxxx; accepted Month x, xxxx. (corresponding author: Ibrahim Anwar Ibrahim.) Ibrahim Anwar Ibrahim and M. J. Hossain are with the School of Engineering, Faculty of Science and Engineering, Macquarie University, Sydney, NSW 2109, Australia (e-mail: ibrahim.a.ibrahim@hdr.mq.edu.au; jahangir.hossain@mq.edu.au). Benjamin C. Duck is with the CSIRO Energy, Mayfield West, NSW 2300, Australia (e-mail: benjamin.duck@csiro.au). MPP Maximum-Power Point MAPE Mean Absolute Percentage Error MBE Mean Bias Error MSE Mean Squared Error OOB Out-Of-Bag PV Photovoltaic RF s Random Forests RMSE Root-Mean-Squared Error VI Variable Importance Symbols I Output current of the cell model I Ph Generated photocurrent I 0 Diode reverse-saturation current R s Series resistance R sh Shunt resistance V Output voltage of the PV cell V t Thermal voltage a Diode ideality factor k b Boltzmann’s constant q Charge of the electron T Module temperature [K] I sc Short-circuit current V oc Open-circuit voltage I mpp PV output current at the MPP P mpp Power output at the MPP X Input vector of RFs algorithm Y Expected output vector from RFs algorithm x i Arrived variable of the bootstrap sample x expect Expected output sample inside a tree ˆ Y (X i ) Final predicted output of a given input sample (X i ) Y i Actual output of RFs model at iteration i N Total number of OBB samples Y h(θ i ,x) Regression-tree basic learner at any i = j Y h(θ j ,x) Regression-tree basic learner at any j = i ρ Weighted correlation between Y h(θ i ,x) and Y h(θ j ,x) θ i Independent random vector with the same dis- tribution of x ǫ r Average error of all the trees in the forest ǫ tree Error of each tree ǫ θ Expectation relative to the random parameter θ β c(k) OBB samples for each tree in the forest k Number of one tree in the forest (1, 2,...,K) K Total number of trees in the forest L True label