Data mining process for modeling hydrological time series M. Erol Keskin, Dilek Taylan and Ecir Ugur Kucuksille ABSTRACT The main purpose of this study was to develop an optimum ow prediction model, based on data mining process. The data mining process was applied to predict river ow of Seyhan Stream in the southern part of Turkey. Hydrological time series modeling was applied using monthly historical ow records to predict Seyhan Stream ows. Seyhan Stream ows were modeled by Markov models and it was seen that it adapted AR(2). Hence, F t2 and F t1 ows in (t2) and (t1) months were the taken inputs. For monthly streamow predictions, data were taken from the General Directorate of Electrical Power Resources Survey and Development Administration. Used data covered 35 years between 1969 and 2003 for monthly streamows. Furthermore, for the effect of monthly periodicity in hydrological time series cos (2π i /12), sin (2π i /12) (I ¼ 1, 2,, 12) were included as inputs. Then, F t ows in (t ) months were modeled by data mining process. It was concluded that with using data mining process for streamow prediction, it was possible to estimate missing or unmeasured data. M. Erol Keskin Ecir Ugur Kucuksille Faculty of Engineering-Architecture, Suleyman Demirel University, Isparta 32260, Turkey Dilek Taylan (corresponding author) Faculty of Engineering, Suleyman Demirel University, Isparta 32260, Turkey E-mail: dilektaylan@sdu.edu.tr Key words | AR models, data mining process, ow prediction INTRODUCTION In the planning of water structures, future predictions based on past records are necessary for the assessment of design criteria. The identication of suitable generation models for future streamows is an important precondition for suc- cessful planning and management of water resources. Recently, the dominance of deterministic models in hydrol- ogy has gradually weakened as a number of factors have affected the constitution of hydrological events; therefore, the random nature of hydrological variables needs to be studied. Hipel () showed that a simple stochastic approach gave better results than a more complex deterministic model. When available observation records are insufcient, the gener- ating synthetic ow series can help a designer to carry out the analysis. Stochastic models, which reproduce the essential properties of the real process, are generally used for the gener- ation of synthetic series and the prediction of future ows. Evaluating a large number of alternatives is necessary to reduce risk. For instance, Sert () generated synthetic stream- ows in order to obtain input for a simulation model aimed at operating the KebanKarakayaAtatürk Reservoir system and investigated risks resulting from the stochastic character of hydrological events. Generated series should maintain the same statistical characteristic of the historical series, such as the mean, standard deviation, skewness and autocorrelation coefcient. In this study, an autoregressive (AR) model is used for stochastic modeling of streamow prediction. Data mining (DM) is a hybrid technique that integrates technologies of databases, statistics, machine learning, signal processing, and high-performance computing. This emerging technology is motivated by the need for new tech- niques to help analyze, understand or even visualize the huge amounts of stored data gathered from business and scientic applications. The major data mining functions that are developed in the commercial and research commu- nities include summarization, association, classication, prediction and clustering (Zhou ). A good relational database management system will form the core of the data repository, and adequately reect both the data structure and the process ow; therefore, the database design would anticipate the kind of analysis and 78 © IWA Publishing 2013 Hydrology Research | 44.1 | 2013 doi: 10.2166/nh.2012.003 Downloaded from http://iwaponline.com/hr/article-pdf/44/1/78/370190/78.pdf by guest on 05 July 2022