Data mining process for modeling hydrological time series M. Erol Keskin, Dilek Taylan and Ecir Ugur Kucuksille ABSTRACT The main purpose of this study was to develop an optimum ﬂow prediction model, based on data mining process. The data mining process was applied to predict river ﬂow of Seyhan Stream in the southern part of Turkey. Hydrological time series modeling was applied using monthly historical ﬂow records to predict Seyhan Stream ﬂows. Seyhan Stream ﬂows were modeled by Markov models and it was seen that it adapted AR(2). Hence, F t–2 and F t–1 ﬂows in (t–2) and (t–1) months were the taken inputs. For monthly streamﬂow predictions, data were taken from the General Directorate of Electrical Power Resources Survey and Development Administration. Used data covered 35 years between 1969 and 2003 for monthly streamﬂows. Furthermore, for the effect of monthly periodicity in hydrological time series cos (2π i /12), sin (2π i /12) (I ¼ 1, 2,…, 12) were included as inputs. Then, F t ﬂows in (t ) months were modeled by data mining process. It was concluded that with using data mining process for streamﬂow prediction, it was possible to estimate missing or unmeasured data. M. Erol Keskin Ecir Ugur Kucuksille Faculty of Engineering-Architecture, Suleyman Demirel University, Isparta 32260, Turkey Dilek Taylan (corresponding author) Faculty of Engineering, Suleyman Demirel University, Isparta 32260, Turkey E-mail: dilektaylan@sdu.edu.tr Key words | AR models, data mining process, ﬂow prediction INTRODUCTION In the planning of water structures, future predictions based on past records are necessary for the assessment of design criteria. The identiﬁcation of suitable generation models for future streamﬂows is an important precondition for suc- cessful planning and management of water resources. Recently, the dominance of deterministic models in hydrol- ogy has gradually weakened as a number of factors have affected the constitution of hydrological events; therefore, the random nature of hydrological variables needs to be studied. Hipel () showed that a simple stochastic approach gave better results than a more complex deterministic model. When available observation records are insufﬁcient, the gener- ating synthetic ﬂow series can help a designer to carry out the analysis. Stochastic models, which reproduce the essential properties of the real process, are generally used for the gener- ation of synthetic series and the prediction of future ﬂows. Evaluating a large number of alternatives is necessary to reduce risk. For instance, Sert () generated synthetic stream- ﬂows in order to obtain input for a simulation model aimed at operating the Keban–Karakaya–Atatürk Reservoir system and investigated risks resulting from the stochastic character of hydrological events. Generated series should maintain the same statistical characteristic of the historical series, such as the mean, standard deviation, skewness and autocorrelation coefﬁcient. In this study, an autoregressive (AR) model is used for stochastic modeling of streamﬂow prediction. Data mining (DM) is a hybrid technique that integrates technologies of databases, statistics, machine learning, signal processing, and high-performance computing. This emerging technology is motivated by the need for new tech- niques to help analyze, understand or even visualize the huge amounts of stored data gathered from business and scientiﬁc applications. The major data mining functions that are developed in the commercial and research commu- nities include summarization, association, classiﬁcation, prediction and clustering (Zhou ). A good relational database management system will form the core of the data repository, and adequately reﬂect both the data structure and the process ﬂow; therefore, the database design would anticipate the kind of analysis and 78 © IWA Publishing 2013 Hydrology Research | 44.1 | 2013 doi: 10.2166/nh.2012.003 Downloaded from http://iwaponline.com/hr/article-pdf/44/1/78/370190/78.pdf by guest on 05 July 2022