Data mining process for modeling hydrological time series
M. Erol Keskin, Dilek Taylan and Ecir Ugur Kucuksille
ABSTRACT
The main purpose of this study was to develop an optimum flow prediction model, based on data
mining process. The data mining process was applied to predict river flow of Seyhan Stream in
the southern part of Turkey. Hydrological time series modeling was applied using monthly historical
flow records to predict Seyhan Stream flows. Seyhan Stream flows were modeled by Markov
models and it was seen that it adapted AR(2). Hence, F
t–2
and F
t–1
flows in (t–2) and (t–1) months
were the taken inputs. For monthly streamflow predictions, data were taken from the General
Directorate of Electrical Power Resources Survey and Development Administration. Used data
covered 35 years between 1969 and 2003 for monthly streamflows. Furthermore, for the effect of
monthly periodicity in hydrological time series cos (2π
i
/12), sin (2π
i
/12) (I ¼ 1, 2,…, 12) were included
as inputs. Then, F
t
flows in (t ) months were modeled by data mining process. It was concluded that
with using data mining process for streamflow prediction, it was possible to estimate missing or
unmeasured data.
M. Erol Keskin
Ecir Ugur Kucuksille
Faculty of Engineering-Architecture,
Suleyman Demirel University,
Isparta 32260,
Turkey
Dilek Taylan (corresponding author)
Faculty of Engineering,
Suleyman Demirel University,
Isparta 32260,
Turkey
E-mail: dilektaylan@sdu.edu.tr
Key words | AR models, data mining process, flow prediction
INTRODUCTION
In the planning of water structures, future predictions based
on past records are necessary for the assessment of design
criteria. The identification of suitable generation models
for future streamflows is an important precondition for suc-
cessful planning and management of water resources.
Recently, the dominance of deterministic models in hydrol-
ogy has gradually weakened as a number of factors have
affected the constitution of hydrological events; therefore, the
random nature of hydrological variables needs to be studied.
Hipel () showed that a simple stochastic approach gave
better results than a more complex deterministic model.
When available observation records are insufficient, the gener-
ating synthetic flow series can help a designer to carry out the
analysis. Stochastic models, which reproduce the essential
properties of the real process, are generally used for the gener-
ation of synthetic series and the prediction of future flows.
Evaluating a large number of alternatives is necessary to
reduce risk. For instance, Sert () generated synthetic stream-
flows in order to obtain input for a simulation model aimed at
operating the Keban–Karakaya–Atatürk Reservoir system and
investigated risks resulting from the stochastic character of
hydrological events. Generated series should maintain the
same statistical characteristic of the historical series, such as
the mean, standard deviation, skewness and autocorrelation
coefficient. In this study, an autoregressive (AR) model is
used for stochastic modeling of streamflow prediction.
Data mining (DM) is a hybrid technique that integrates
technologies of databases, statistics, machine learning,
signal processing, and high-performance computing. This
emerging technology is motivated by the need for new tech-
niques to help analyze, understand or even visualize the
huge amounts of stored data gathered from business and
scientific applications. The major data mining functions
that are developed in the commercial and research commu-
nities include summarization, association, classification,
prediction and clustering (Zhou ).
A good relational database management system will
form the core of the data repository, and adequately reflect
both the data structure and the process flow; therefore, the
database design would anticipate the kind of analysis and
78 © IWA Publishing 2013 Hydrology Research | 44.1 | 2013
doi: 10.2166/nh.2012.003
Downloaded from http://iwaponline.com/hr/article-pdf/44/1/78/370190/78.pdf
by guest
on 05 July 2022