Vol.16, No.7 ©2005 Journal of Software 1000-9825/2005/16(07)1252 数据流上的预测聚集查询处理算法 李建中 + , 郭龙江, 张冬冬, 王伟平 (尔滨工业大学 计算机⾥学Ϣ技术学䰶,黑龙江 尔滨 150001) Processing Algorithms for Predictive Aggregate Queries over Data Streams LI Jian-Zhong + , GUO Long-Jiang, ZHANG Dong-Dong, WANG Wei-Ping (Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China) + Corresponding author: Phn: +86-451-86415827, E-mail: lijzh@hit.edu.cn, http://db.cs.hit.edu.cn Received 2004-05-17; Accepted 2005-02-03 Li JZ, Guo LJ, Zhang DD, Wang WP. Processing algorithms for predictive aggregate queries over data streams. Journal of Software, 2005,16(7):12521261. DOI: 10.1360/jos161252 Abstract: It is very important in a lot of applications to forecast future trend of data streams. For example, using predictive queries to a sensor network for monitoring environment, observers can forecast future average temperature and humidity in the area covered by the network to determine abnormal events. Recent works on query processing over data streams mainly focused on approximate queries over newly arriving data. To the best of the knowledge, there is nothing to date in the literature on predictive query processing over data streams. Adopting multivariable linear regression, a predictive mathematical model for forecasting the aggregate value over data streams is first proposed. Then, based on the model, a predictive aggregate query processing method over data streams is proposed in the paper. When the frequency of forecast failing is greater than a predefined threshold, an adaptive strategy for the predictive mathematical model is proposed. A mathematical model that characterizes the affects of the updating cycle of sliding window and data stream rate on predictive accuracy is also presented. Analytical and experimental results show that the proposed method is very effective, and the proposed algorithms have higher performance and provide better prediction of aggregate values over data streams to users. In experiments the TPC-H data and ocean air temperature data measured by TAO (tropical atmosphere ocean) are used to construct data streams. Key words: data stream; future data window; multivariable linear regression; predictive aggregate queries : 实时数据未来趋势的预测具有重的实际用. 如,在环境监测传感器网中, 通过对感知数 据进行预测聚集查, 观察者可预测网覆盖的区域在未来段时间内的平均温度和湿度, 确定是否会 发生常事件. 目前的究工作多数集中在数据上当前数据的查, 数据上预测查的究工作还很少. Supported by the National Natural Science Foundation of China under Grant No.60473075 (೑家自然⾥学෎金); the Key Project of the Natural Science Foundation of Heilongjiang Province under Grant No.ZJG03-05 (黑龙江省自然⾥学෎金䞡点项目) 作者简介: 李建中(1950ˉ),,黑龙江尔滨人,博士,教授,博士生导师,CCF 高级会,Џ要研お领域Ў数据ᑧ,并行计算技术; 郭龙江(1973ˉ),,博士生,讲师,Џ要研お领域Ў数据ᑧ,数据流,Ӵ感器网㒰;张冬冬(1976ˉ),,博士生,Џ要研お领域ЎߚᏗ式数据 ;王伟平(1975ˉ),,博士生,Џ要研お领域Ў并行数据流.