FIR Forecasting Strategies Able to Cope with Missing Data:
A Smart Grid Application
Sergio Jurado
a
, Àngela Nebot
b
, Fransisco Mugica
b
and Mihail Mihaylov
c
ᵃ Sensing & Control Systems, Aragó 208-210, 08011 Barcelona, Spain
Email: sergio.jurado @sensingcontrol.com
Email: s.juradogomez@gmail.com
Phone: +34 605 565 303;
ᵇ Soft Computing research group, Technical University of Catalonia, Jordi Girona 1-3, 08034 Barcelona, Spain
Email:{angela,fmugica}@lsi.upc.edu
c
AI lab, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
Email: mmihaylo@vub.ac.be
Abstract: Dealing with missing data is of great practical and theoretical interest in forecasting applications. In this
study, we deal with the problem of forecasting with missing data in smart grid and smart home applications, where
the information from home area sensors and/or smart meters is sometimes missing, which may hinder or even prevent
the forecasting of the next hours and days. In concrete, we focus in a Soft Computing technique called Fuzzy
Inductive Reasoning (FIR) and its improved version that can cope with missing information in the forecasting
process: flexible FIR. In this article eight different strategies for flexible FIR forecasting are defined and studied
taking into account: causal relevance of input variables, consistency of predictions, inertia criterion and K-Nearest
Neighbours. Furthermore, we evaluate the implications of prediction accuracy and number of registers predicted,
when the number of Missing Values (MVs) in the training dataset is increased progressively. To this end, a real smart
grid forecasting application, i.e. electricity load forecasting, has been chosen in this study. The results show that all
eight strategies proposed are able to cope with MVs and take advantage of the inherent information in the data, with
better results in those strategies making use of causal relevance. In addition, the robustness of flexible FIR and its
eight strategies are proved taking into account that the percentage of registers predicted is on average 96.15% when
the %MVs in training dataset was around 73%.
Keywords: Soft Computing, Fuzzy Inductive Reasoning, Entropy-based Feature Selection, Prediction with Missing
Values, Energy Modelling
1. Introduction
The problem of missing data is of great practical and theoretical interest in forecasting applications. It is
important to know how to react to certain situations where Missing Values (MVs) are present both during
the model generation and the offline/online forecasting. As an example, when a sensor fails in a
production process, it might not be necessary to stop everything if sufficient information is implicitly
contained in the remaining sensor data. Furthermore, in economic forecasting, one might want to continue
to use a predictor even when an input variable becomes meaningless (for example, due to political
changes in a country). In real breast cancer problems [1], missing data imputation is an important task in
cases where it is crucial to use all available data and not discard records with missing values.
In this study, we deal with the missing data problem in smart grid or smart home forecasting applications.
The information that arrives from the different sensors in the home area network and/or the smart meters,
may contain missing data, which may hinder or even prevent the forecasting of the next hours and days.
This issue is observed in projects such as iURBAN [2] and GreenCom [3], where either the smart
metering infrastructure or the smart home gateway, occasionally, does not send data correctly. This may
be caused by a loss in the Internet connection of the gateway, fail in the communication between smart
meters and concentrators, or issues between the database interfaces.
In many studies, the problem of MVs is treated from a pre-processing perspective; conventional missing
data imputation techniques, such as the substitution with the mean for an unknown feature is studied in
[4], which can lead to solutions that are far from optimal. In [5] Lakshminarayan et al., explore the use of
machine-learning based alternatives to standard statistical data completion methods, for dealing with
missing data. Barladi et al. [6] propose a novel method for missing data reconstruction by fuzzy
similarity. Other studies where the missing data problem is approached from a pre-processing point of
view are [7] and [8]. However, conventional missing data deletion techniques like list-wise and pair-wise
© 2016 Elsevier. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/