Citation: Offiong, N.M.; Memon, F.A.; Wu, Y. Time Series Data Preparation for Failure Prediction in Smart Water Taps (SWT). Sustainability 2023, 15, 6083. https:// doi.org/10.3390/su15076083 Academic Editors: Ximing Cai and Erhu Du Received: 13 January 2023 Revised: 17 February 2023 Accepted: 22 February 2023 Published: 31 March 2023 Copyright: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/). sustainability Article Time Series Data Preparation for Failure Prediction in Smart Water Taps (SWT) Nsikak Mitchel Offiong 1, * , Fayyaz Ali Memon 1 and Yulei Wu 2 1 Centre for Water Systems, University of Exeter, Exeter EX4 4QF, UK 2 Department of Computer Science, EMPS, University of Exeter, Exeter EX4 4QF, UK * Correspondence: no270@exeter.ac.uk Abstract: Smart water tap (SWT) time series model development for failure prediction requires acquiring data on the variables of interest to researchers, planners, engineers and decision makers. Thus, the data are expected to be ‘noiseless’ (i.e., without discrepancies such as missing data, data redundancy and data duplication) raw inputs for modelling and forecasting tasks. However, historical datasets acquired from the SWTs contain data discrepancies that require preparation before applying the dataset to develop a failure prediction model. This paper presents a combination of the generative adversarial network (GAN) and the bidirectional gated recurrent unit (BiGRU) techniques for missing data imputation. The GAN aids in training the SWT data trend and distribution, enabling the imputed data to be closely similar to the historical dataset. On the other hand, the BiGRU was adopted to save computational time by combining the model’s cell state and hidden state during data imputation. After data imputation there were outliers, and the exponential smoothing method was used to balance the data. The result shows that this method can be applied in time series systems to correct missing values in a dataset, thereby mitigating data noise that can lead to a biased failure prediction model. Furthermore, when evaluated using different sets of historical SWT data, the method proved reliable for missing data imputation and achieved better training time than the traditional data imputation method. Keywords: missing data; generative adversarial network; bidirectional gated recurrent unit; smart water tap; failure prediction; data imputation 1. Introduction A sustainable solution for rural water delivery requires accurate water infrastructure assessment and efficient data processing techniques. These techniques need data, which should come from regular usage of the water infrastructure. However, most rural water in- stallations lack accurate data from the available repository [1]. Therefore, with inadequate, partial, or missing data regarding the smart water taps, it is difficult to develop a compre- hensive failure prediction model or an early warning system. Furthermore, investment in extensive inspection and data-gathering programmes on smart rural taps to overcome data gaps may not be financially feasible for rural water management agencies [2]. So, to achieve failure prediction for rural water taps, the available time series data generated from the system usage and data manipulation is sufficient for critical analysis irrespective of the discrepancies. Part of the aim of this paper is to develop a failure prediction model for smart water taps to support proactive maintenance, which can help provide a sustainable water supply to rural communities in sub-Saharan Africa and similar contexts. Solar-powered smart water taps (SWT) deployed to rural areas in some parts of Africa are perceived as low-cost and reliable water supply sources for domestic use in the region. These SWTs, often referred to as e-taps, dispense water when a pre-paid token comes in contact with them. During their functional time, the smart taps generate time series datasets that can be analysed Sustainability 2023, 15, 6083. https://doi.org/10.3390/su15076083 https://www.mdpi.com/journal/sustainability