Engineering Applications of Artificial Intelligence 77 (2019) 283–310 Contents lists available at ScienceDirect Engineering Applications of Artificial Intelligence journal homepage: www.elsevier.com/locate/engappai Stability-based Dynamic Bayesian Network method for dynamic data mining Mohamed Naili a, , Mustapha Bourahla b , Makhlouf Naili c , AbdelKamel Tari d a Department of Computer Science, Faculty of Mathematics and Informatics, University of Bordj Bou Arreridj, 34030 Bordj Bou Arreridj, Algeria b Department of Computer Science, University of M’Sila, 28000 M’sila, Algeria c Department of Computer Science, University of Biskra, 07000 Biskra, Algeria d Laboratory of Medical Computing (LIMED), Faculty of Fundamental Sciences, University of Bejaia, 06000 Bejaia, Algeria ARTICLE INFO Keywords: Dynamic data mining Dynamic model Stability Dynamic Bayesian Network Grow-Shrink algorithm Modeling and simulation ABSTRACT In this article we introduce a new stability-based dynamic Bayesian network method for dynamic systems represented by their time series. Based on the Grow Shrink algorithm and the stability of the network through time, new variables and arcs could be added to the network in order to generate missing data or predict future values. The concept of stability in the network is maintained through a stability matrix which contains learned values that indicate the strength of dependencies between variables along the time. Moreover, we present the application of the proposed method to deal with the problem of prediction in a real-life air quality case study, in which we try to predict the level of Carbon monoxide in the air, comparing between the results obtained using the proposed method and those obtained using the Vector Autoregression model. 1. Introduction One of the major challenges to address in a dynamic system is predicting missing or future data. Unfortunately, most real-life systems require a considerable time to collect sufficient data (with a considerable possibility of missing data) before carrying out any analysis or data mining to extract useful knowledge. Variation in real-life processes makes the integration of new methods a necessity. Several methods have been already proposed. These include Autoregressive Integrated Moving Average (ARIMA) and Vector Autore- gression (VAR) models, Artificial Neural Networks (ANN) and Bayesian network (BN) and its dynamic extension known as Dynamic Bayesian Networks (DBN). A classical way to handle the problem of prediction is ARIMA model. An ARIMA model describes the relationship between current values of a given variable and its previous values and errors, in order to forecast future estimations. Several publications have shown how such models can be used (either alone or in combination with other types of models) in the traffic flow, water quality, cloud computing and economics (Williams and Hoel, 2003; Ömer Faruk, 2010; Zhang, 2003; Contreras et al., 2003; Calheiros et al., 2015; Babu and Reddy, 2014; Christodoulos et al., 2010; Narendra Babu and Eswara Reddy, 2015). One important drawback of ARIMA models is the difficulty to repre- sent in a compact form the relationship between different variables. This stems from the fact that ARIMA model was conceived mainly for uni- variate problems. Because of this limitation, the VAR model (Hamilton, Corresponding author. E-mail address: mohamednailimail@gmail.com (M. Naili). 1994; Montgomery et al., 2008) has been proposed to deal with multi- variate time series by modeling the underlying dependencies between variables. In this model, a given variable can be predicted on the basis of its previous values and those of other variables. Artificial Neural Network models (ANN) have been designed in order to process a set of inputs through an input layer in order to calculate outputs. This is achieved by modeling the possible non-linear relations between inputs through a set of functions and connections between the ‘‘neurons’’ of this network. Despite the difficulty in explaining the parameters learned during the training phase, ANN models have demon- strated a strong ability to model real-life static and dynamic systems such as the flood flow, energy consumption forecasting, stock market prediction and so on (Foster et al., 1992; Amrouche and Le Pivert, 2014; Qiu et al., 2016; Chae et al., 2016). The necessity to explain outputs has driven many researchers to- wards the use of Bayesian network models. A Bayesian network is a Directed Acyclic Graph (DAG) used to represent dependencies among a set of variables. Bayesian Network, and especially its temporal extension i.e. the Dynamic Bayesian Network (Robinson and Hartemink, 2008) have been used for financial purposes (Kita et al., 2012) in healthcare and biological systems (Sandri et al., 2014; van der Heijden et al., 2014; Acerbi et al., 2016) Traffic flow forecasting (Queen and Albers, 2009) monitoring (Lv et al., 2013; Wenhui et al., 2005; An et al., 2013; Cheng et al., 2012) and many other applications. As mentioned previously, VAR models are used to predict a given variable’s future values on the basis of its and other variables’ previous https://doi.org/10.1016/j.engappai.2018.09.016 Received 27 September 2017; Received in revised form 12 September 2018; Accepted 23 September 2018 Available online xxxx 0952-1976/© 2018 Elsevier Ltd. All rights reserved.