Engineering Applications of Artificial Intelligence 77 (2019) 283–310
Contents lists available at ScienceDirect
Engineering Applications of Artificial Intelligence
journal homepage: www.elsevier.com/locate/engappai
Stability-based Dynamic Bayesian Network method for dynamic data
mining
Mohamed Naili
a,∗
, Mustapha Bourahla
b
, Makhlouf Naili
c
, AbdelKamel Tari
d
a
Department of Computer Science, Faculty of Mathematics and Informatics, University of Bordj Bou Arreridj, 34030 Bordj Bou Arreridj, Algeria
b
Department of Computer Science, University of M’Sila, 28000 M’sila, Algeria
c
Department of Computer Science, University of Biskra, 07000 Biskra, Algeria
d
Laboratory of Medical Computing (LIMED), Faculty of Fundamental Sciences, University of Bejaia, 06000 Bejaia, Algeria
ARTICLE INFO
Keywords:
Dynamic data mining
Dynamic model
Stability
Dynamic Bayesian Network
Grow-Shrink algorithm
Modeling and simulation
ABSTRACT
In this article we introduce a new stability-based dynamic Bayesian network method for dynamic systems
represented by their time series. Based on the Grow Shrink algorithm and the stability of the network through
time, new variables and arcs could be added to the network in order to generate missing data or predict future
values. The concept of stability in the network is maintained through a stability matrix which contains learned
values that indicate the strength of dependencies between variables along the time. Moreover, we present the
application of the proposed method to deal with the problem of prediction in a real-life air quality case study, in
which we try to predict the level of Carbon monoxide in the air, comparing between the results obtained using
the proposed method and those obtained using the Vector Autoregression model.
1. Introduction
One of the major challenges to address in a dynamic system is
predicting missing or future data. Unfortunately, most real-life systems
require a considerable time to collect sufficient data (with a considerable
possibility of missing data) before carrying out any analysis or data
mining to extract useful knowledge.
Variation in real-life processes makes the integration of new methods
a necessity. Several methods have been already proposed. These include
Autoregressive Integrated Moving Average (ARIMA) and Vector Autore-
gression (VAR) models, Artificial Neural Networks (ANN) and Bayesian
network (BN) and its dynamic extension known as Dynamic Bayesian
Networks (DBN).
A classical way to handle the problem of prediction is ARIMA model.
An ARIMA model describes the relationship between current values
of a given variable and its previous values and errors, in order to
forecast future estimations. Several publications have shown how such
models can be used (either alone or in combination with other types
of models) in the traffic flow, water quality, cloud computing and
economics (Williams and Hoel, 2003; Ömer Faruk, 2010; Zhang, 2003;
Contreras et al., 2003; Calheiros et al., 2015; Babu and Reddy, 2014;
Christodoulos et al., 2010; Narendra Babu and Eswara Reddy, 2015).
One important drawback of ARIMA models is the difficulty to repre-
sent in a compact form the relationship between different variables. This
stems from the fact that ARIMA model was conceived mainly for uni-
variate problems. Because of this limitation, the VAR model (Hamilton,
∗
Corresponding author.
E-mail address: mohamednailimail@gmail.com (M. Naili).
1994; Montgomery et al., 2008) has been proposed to deal with multi-
variate time series by modeling the underlying dependencies between
variables. In this model, a given variable can be predicted on the basis
of its previous values and those of other variables.
Artificial Neural Network models (ANN) have been designed in order
to process a set of inputs through an input layer in order to calculate
outputs. This is achieved by modeling the possible non-linear relations
between inputs through a set of functions and connections between
the ‘‘neurons’’ of this network. Despite the difficulty in explaining the
parameters learned during the training phase, ANN models have demon-
strated a strong ability to model real-life static and dynamic systems
such as the flood flow, energy consumption forecasting, stock market
prediction and so on (Foster et al., 1992; Amrouche and Le Pivert, 2014;
Qiu et al., 2016; Chae et al., 2016).
The necessity to explain outputs has driven many researchers to-
wards the use of Bayesian network models. A Bayesian network is a
Directed Acyclic Graph (DAG) used to represent dependencies among a
set of variables. Bayesian Network, and especially its temporal extension
i.e. the Dynamic Bayesian Network (Robinson and Hartemink, 2008)
have been used for financial purposes (Kita et al., 2012) in healthcare
and biological systems (Sandri et al., 2014; van der Heijden et al., 2014;
Acerbi et al., 2016) Traffic flow forecasting (Queen and Albers, 2009)
monitoring (Lv et al., 2013; Wenhui et al., 2005; An et al., 2013; Cheng
et al., 2012) and many other applications.
As mentioned previously, VAR models are used to predict a given
variable’s future values on the basis of its and other variables’ previous
https://doi.org/10.1016/j.engappai.2018.09.016
Received 27 September 2017; Received in revised form 12 September 2018; Accepted 23 September 2018
Available online xxxx
0952-1976/© 2018 Elsevier Ltd. All rights reserved.