Vol.16, No.7 ©2005 Journal of Software 软 件 学 报 1000-9825/2005/16(07)1252
数据流上的预测聚集查询处理算法
∗
李建中
+
, 郭龙江, 张冬冬, 王伟平
(尔滨工业大学 计算机⾥学Ϣ技术学䰶,黑龙江 尔滨 150001)
Processing Algorithms for Predictive Aggregate Queries over Data Streams
LI Jian-Zhong
+
, GUO Long-Jiang, ZHANG Dong-Dong, WANG Wei-Ping
(Institute of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)
+ Corresponding author: Phn: +86-451-86415827, E-mail: lijzh@hit.edu.cn, http://db.cs.hit.edu.cn
Received 2004-05-17; Accepted 2005-02-03
Li JZ, Guo LJ, Zhang DD, Wang WP. Processing algorithms for predictive aggregate queries over data
streams. Journal of Software, 2005,16(7):1252−1261. DOI: 10.1360/jos161252
Abstract: It is very important in a lot of applications to forecast future trend of data streams. For example, using
predictive queries to a sensor network for monitoring environment, observers can forecast future average
temperature and humidity in the area covered by the network to determine abnormal events. Recent works on query
processing over data streams mainly focused on approximate queries over newly arriving data. To the best of the
knowledge, there is nothing to date in the literature on predictive query processing over data streams. Adopting
multivariable linear regression, a predictive mathematical model for forecasting the aggregate value over data
streams is first proposed. Then, based on the model, a predictive aggregate query processing method over data
streams is proposed in the paper. When the frequency of forecast failing is greater than a predefined threshold, an
adaptive strategy for the predictive mathematical model is proposed. A mathematical model that characterizes the
affects of the updating cycle of sliding window and data stream rate on predictive accuracy is also presented.
Analytical and experimental results show that the proposed method is very effective, and the proposed algorithms
have higher performance and provide better prediction of aggregate values over data streams to users. In
experiments the TPC-H data and ocean air temperature data measured by TAO (tropical atmosphere ocean) are used
to construct data streams.
Key words: data stream; future data window; multivariable linear regression; predictive aggregate queries
摘 要: 实时数据未来趋势的预测具有重的实际用. 如,在环境监测传感器网中, 通过对感知数
据进行预测聚集查, 观察者可预测网覆盖的区域在未来段时间内的平均温度和湿度, 确定是否会
发生常事件. 目前的究工作多数集中在数据上当前数据的查, 数据上预测查的究工作还很少. 采
∗ Supported by the National Natural Science Foundation of China under Grant No.60473075 (家自然⾥学金); the Key Project
of the Natural Science Foundation of Heilongjiang Province under Grant No.ZJG03-05 (黑龙江省自然⾥学金䞡点项目)
作者简介: 李建中(1950ˉ),男,黑龙江尔滨人,博士,教授,博士生导师,CCF 高级会,Џ要研お领域Ў数据ᑧ,并行计算技术;
郭龙江(1973ˉ),男,博士生,讲师,Џ要研お领域Ў数据ᑧ,数据流,Ӵ感器网㒰;张冬冬(1976ˉ),男,博士生,Џ要研お领域ЎߚᏗ式数据
流;王伟平(1975ˉ),男,博士生,Џ要研お领域Ў并行数据流.