WATER RESOURCES RESEARCH, VOL. 16, NO. 1, PP. 77-96, FEBRUARY 1980 Extension and Application of Feature Prediction Model for Synthesis of Hydrologic Records UMED SINGH PANU AND T. E. UNNY Civil Engineering Department, University of Waterloo,Waterloo, Ontario,Canada N26 3G1 The methoddescribed in this paper for the synthesis of streamflows differsfrom the traditional ap- proaches in synthetic hydrology in the sense that it utilizes the information contained in or among the groups of datain a streamflow record. The existense of such groups in geophysical records, including hy- drologic records, is well emphasized by Hurst (1951). Further,in the proposed method, based on con- cepts of pattern recognition, neither a basic structure noranypreconceived model isimposed on thedata; ratherthe data are allowed to speak for themselves in a most 'democratic' way. The preliminary details of the method wereprovided in an earlierpaper by Panuet al. (1978).The intentof thispaper is to describe a procedure whereby it is possible to specify explicitly multivariate probability distribution for the intra- pattern structure and first-order MarkovJan dependence for the interpattern structure in the feature pre- dictionmodel(Panu et al., 1978). The various steps involved in the construction and operation of the model for streamflow synthesis are presented. The application of the model for synthesizing monthly streamflow records of three Canadianriversexhibitingbiannual cycles is explained. Statistical and hy- drological tests show that these synthetic realizations possess relevant properties that are comparable with the corresponding properties contained in the historical record. This articleshould be read in con- junctionwith the previous publication by Panuet al. (1978). INTRODUCTION Synthesis of streamflows hasbeenan active area of research ever since the pioneering efforts made by the Harvard re- search group. Linear autoregressive models have been applied to hydrologic records consisting of annual, monthly, or daily values. Various versions of these models exist in literature, dif- fering only in the manner in which the nonstationarity and the periodicities are removedfrom the historical record. Al- though thesemodels are simple in their formulation and are economicalin their usage,they have been found inadequate to preserve the characteristics of the extreme events,floods and droughts. The inability of linear autoregressive modelsto preserve long-term dependence characteristics has led to the development of fractional Gaussian noise models and broken line models. These models, although they preserve the Hurst coefficient in the synthesized records, possess inadequacies in relation to the preservation of other characteristics of stream- flows[Jackson, 1975; Lawranceand Kottegoda, 1977].The de- velopment of disaggregation models hasprovided a promising avenue for streamflowsynthesis. However, thesemodelsem- ploy either fractional Gaussiannoise models or broken line models for the synthesis of 'higher-level' values, thus inher- iting the shortcomings attributedto these latter models. Since the publication of the book by Box and Jenkins [1970],there has been a surge in the application of ARMA models.In hydrologiccontextthe difficulties associated with these models are well emphasized by Rodriguez-Iturbe [1971], Wallis [1972], Kavvasand Delleur [1975],Delleur et al. [1976], Unny [1976], and Lawranceand Kottegoda [1977],among oth- ers. The dissatisfaction with conventional ARMA models has already encouraged Tao and Delleur [1976] to develop time- varying ARMA modelsto preserve the seasonally varying autocorrelations and Lettenmaierand Burges [ 1977]to develop Markov-ARMA models in order to preserve both short- and long-term persistence characterictics in synthesized stream- flow records. Long-term planning and designof water resources systems require samples of streamflow recordsbased on data syn- Copyright ¸ 1980by the AmericanGeophysical Union. Paper number 9W1282. 0043-1397/80/009W-1282501.00 77 thesis. These equiprobable samples provide an opportunity to consider all and different scenarios or possibilities at the de- signstage. On the other hand, real time operation of water re- sources systems requires specification of expected values of streamflowsin the immediate future on a step ahead basis. Obviously,these two requirements are to a certain extent con- tradictory in nature, so that different models are required for these two purposes, though the historical data available are the same. The model discussed in this paper is found satisfactory for data synthesis. In this connection it is evident that any statis- tic, whether it be the expected value, the autocorrelationcoef- ficient at lag l, the Hurst coefficient,or any other similarly es- timated statistic, that is obtained by 'integration' acrossthe length of the samplewill be different from sampleto sample, including the historical sample, though these variations will be distributedaround the population value. In fact, from this point ofview the determination and uge ofa population value are irrelevant in hydrologic data synthesis. Further, for ex- ample, ARMA models available in hydrology often use the sample statistics as rididly fixed 'universal' (population) val- uesin the development of synthetic realizations, thereby lead- ing to unsatisfactory and irrational results. This criticism of ARMA modelsis specifically directedat the useor 'misuse' of these models for data extrapolation, leading to generation of equiprobable future samples. It shouldbe remarkedhere that ARMA models have the greatest advantage in short-term forecasting. In fact, this was the primary objective of the de- velopmentby Box and Jenkins [1970]. On the other hand, the distribution of a sample statistic could be helpful in determining a particular randomly drawn value for this statistic that is likely to occurin any synthesized sample.Because in mostcases the distributionof these sample statistics is difficult to ascertain and, further, the various sample statistics may be jointly distributed,it is difficult to proceedfrom this point of view to construct a model for data synthesis. This difficulty can be overcomeby propounding that all the interrelationships amongdata (representing statis- tics) are contained in the form or shape of the time wave structure, asdemonstrated in this paper.