Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000. Digital Object Identiﬁer 10.1109/ACCESS.2017.DOI Multivariate Time Series Imputation: A Survey on available Methods with a Focus on hybrid GANs ANNA RICHTER* 1 , JYOTIRMAYA IJARADAR* 1,2 , ULF WETZKER 1 , Dr. VINEETA JAIN 1,3 , Dr. ANDREAS FROTZSCHER 1 1 Fraunhofer Institute for Integrated Circuits, Division for Development of adaptive Systems, Dresden, 01187 Germany (e-mail: anna.richter, jyotirmaya.ijaradar, ulf.wetzker, vineeta.jain, andreas.frotzscher@eas.iis.fraunhofer.de) 2 Technical University of Dresden, Department Computer Science, Dresden, 01187 Germany (e-mail: jyotirmaya.ijaradar@mailbox.tu-dresden.de) 3 LNM Institute of Information Technology Jaipur, India Corresponding author: Anna Richter (e-mail: anna.richter@eas.iis.fraunhofer.de). This work was supported by the Bavarian Ministry of Economic Affairs, Regional Development and Energy through the Center for Analytics – Data – Applications (ADA-Center) within the framework of BAYERN DIGITAL II (20-3410-2- 9-8) . ABSTRACT Multivariate time series (MTS) are captured in a great variety of real-world applications. However, analysing and modelling the data for classiﬁcation and forecasting purposes can become very challenging if values are missing in the data set. The need for imputation methods, to ﬁll the gaps in MTS, is well known. Thus, a great variaty of algorithms for solving this task has been proposed in the literature. However, research community is constantly working on the development of advanced algorithms, that fulﬁll the special requirements of multidimensional temporal data, since most of the existing imputation methods treat MTS as ordinary structured data and fail to model the temporal relationships within and between sequences of observations. The main emphasis of MTS imputation research is currently put on deep learning (DL) models, especially models making use of generative adversarial networks (GANs). In our survey, we present a general categorization of imputation algorithms and introduce groups of hybrid GAN-models used for the MTS imputation task, which we investigate and discuss in detail. A quantitative comparison of the hybrid GANs’ performance regarding MTS imputation is presented based on our ﬁndings in the literature. INDEX TERMS Deep Learning, Generative Adversarial Networks, Hybrid GANs, Imputation, Missing Values, Multivariate Time Series I. INTRODUCTION T HE goal of time series analysis is to create a model that accurately depicts the series’ structure and can be used to predict and classify future events based on past observations. Time series analysis is becoming increasingly popular in a variety of real-world applications, including en- vironmental modeling [1], [2], trafﬁc forecasting [3], health monitoring [4], and autonomous driving [5]. Because of recent extensive research, there has been an advancement in time series modeling, reaching from simple linear models to more powerful deep learning (DL) networks. Nonetheless, most models focus on simple time series data sets [4]. However, time series observations in the real world are usually not limited to a single independent variable. Further- more, even if all variables are sampled at a constant rate, it is very common that some data is missing due to data trans- mission issues or broken sensors [6]. Because of manifold measurement strategies and data acquisition devices, missing values for one or more variables are quite common. For some data sets, the missing rate can reach up to 90 % [7]. The PT08.S1 data set [8] on Italian air quality, for example, has a missing rate of 34 percent [9], while the Physionet 2012 data set [10] on medical data has a missing rate of 80 percent [11]. Numerous approaches to handle incomplete MTS data have been developed and can be separated into two superordinate classes: deletion and imputation [12]. Deletion is carried out either listwise or pairwise [13], where mainly samples or features are removed that are only partially observed. It has to be considered that deletion leaves gaps in the data set, possibly resulting in erroneous parameter estimations VOLUME 4, 2016 1