FIR Forecasting Strategies Able to Cope with Missing Data: A Smart Grid Application Sergio Jurado a , Àngela Nebot b , Fransisco Mugica b and Mihail Mihaylov c Sensing & Control Systems, Aragó 208-210, 08011 Barcelona, Spain Email: sergio.jurado @sensingcontrol.com Email: s.juradogomez@gmail.com Phone: +34 605 565 303; Soft Computing research group, Technical University of Catalonia, Jordi Girona 1-3, 08034 Barcelona, Spain Email:{angela,fmugica}@lsi.upc.edu c AI lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium Email: mmihaylo@vub.ac.be Abstract: Dealing with missing data is of great practical and theoretical interest in forecasting applications. In this study, we deal with the problem of forecasting with missing data in smart grid and smart home applications, where the information from home area sensors and/or smart meters is sometimes missing, which may hinder or even prevent the forecasting of the next hours and days. In concrete, we focus in a Soft Computing technique called Fuzzy Inductive Reasoning (FIR) and its improved version that can cope with missing information in the forecasting process: flexible FIR. In this article eight different strategies for flexible FIR forecasting are defined and studied taking into account: causal relevance of input variables, consistency of predictions, inertia criterion and K-Nearest Neighbours. Furthermore, we evaluate the implications of prediction accuracy and number of registers predicted, when the number of Missing Values (MVs) in the training dataset is increased progressively. To this end, a real smart grid forecasting application, i.e. electricity load forecasting, has been chosen in this study. The results show that all eight strategies proposed are able to cope with MVs and take advantage of the inherent information in the data, with better results in those strategies making use of causal relevance. In addition, the robustness of flexible FIR and its eight strategies are proved taking into account that the percentage of registers predicted is on average 96.15% when the %MVs in training dataset was around 73%. Keywords: Soft Computing, Fuzzy Inductive Reasoning, Entropy-based Feature Selection, Prediction with Missing Values, Energy Modelling 1. Introduction The problem of missing data is of great practical and theoretical interest in forecasting applications. It is important to know how to react to certain situations where Missing Values (MVs) are present both during the model generation and the offline/online forecasting. As an example, when a sensor fails in a production process, it might not be necessary to stop everything if sufficient information is implicitly contained in the remaining sensor data. Furthermore, in economic forecasting, one might want to continue to use a predictor even when an input variable becomes meaningless (for example, due to political changes in a country). In real breast cancer problems [1], missing data imputation is an important task in cases where it is crucial to use all available data and not discard records with missing values. In this study, we deal with the missing data problem in smart grid or smart home forecasting applications. The information that arrives from the different sensors in the home area network and/or the smart meters, may contain missing data, which may hinder or even prevent the forecasting of the next hours and days. This issue is observed in projects such as iURBAN [2] and GreenCom [3], where either the smart metering infrastructure or the smart home gateway, occasionally, does not send data correctly. This may be caused by a loss in the Internet connection of the gateway, fail in the communication between smart meters and concentrators, or issues between the database interfaces. In many studies, the problem of MVs is treated from a pre-processing perspective; conventional missing data imputation techniques, such as the substitution with the mean for an unknown feature is studied in [4], which can lead to solutions that are far from optimal. In [5] Lakshminarayan et al., explore the use of machine-learning based alternatives to standard statistical data completion methods, for dealing with missing data. Barladi et al. [6] propose a novel method for missing data reconstruction by fuzzy similarity. Other studies where the missing data problem is approached from a pre-processing point of view are [7] and [8]. However, conventional missing data deletion techniques like list-wise and pair-wise © 2016 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/