The VLDB Journal (2008) 17:683–702 DOI 10.1007/s00778-006-0032-z REGULAR PAPER DAWN: an efficient framework of DCT for data with error estimation Ming-Jyh Hsieh · Wei-Guang Teng · Ming-Syan Chen · Philip S. Yu Received: 20 October 2004 / Accepted: 10 November 2005 / Published online: 29 September 2006 © Springer-Verlag 2006 Abstract On-line analytical processing (OLAP) has become an important component in most data ware- house systems and decision support systems in recent years. In order to deal with the huge amount of data, highly complex queries and increasingly strict response time requirements, approximate query processing has been deemed a viable solution. Most works in this area, however, focus on the space efficiency and are unable to provide quality-guaranteed answers to queries. To rem- edy this, in this paper, we propose an efficient frame- work of DCT for dAta With error estimatioN, called DAWN, which focuses on answering range-sum que- ries from compressed OP-cubes transformed by DCT. Specifically, utilizing the techniques of Geometric se- ries and Euler’s formula, we devise a robust summation function, called the GE function, to answer range que- ries in constant time, regardless of the number of data cells involved. Note that the GE function can estimate the summation of cosine functions precisely; thus the quality of the answers is superior to that of previous works. Furthermore, an estimator of errors based on the Brown noise assumption (BNA) is devised to pro- M.-J. Hsieh · W.-G. Teng · M.-S. Chen (B) Electrical Engineering Department, National Taiwan University, Taipei, Taiwan, ROC e-mail: mschen@cc.ee.ntu.edu.tw W.-G. Teng Department of Engineering Science, National Cheng Kung University, Tainen city 701, Taiwan, ROC P. S. Yu IBM Thomas J.Watson Research Centre, P.O.Box 704, Yorktown, NY 10598, USA e-mail: psyu@us.ibm.com vide tight bounds for answering range-sum queries. Our experiment results show that the DAWN framework is scalable to the selectivity of queries and the available storage space. With GE functions and the BNA method, the DAWN framework not only delivers high quality answers for range-sum queries, but also leads to shorter query response time due to its effectiveness in error estimation. 1 Introduction Approximate query processing has recently emerged as a viable solution for dealing with the huge amount of data, highly complex queries and increasingly strict response time requirements that characterize today’s decision-support-system (DSS) applications. In such sys- tems, users usually pose very complex queries to the on-line analytical processing (OLAP) system, which re- quires complex operations over gigabytes of data and takes a very long time to produce exact answers. Conse- quently, the issue of approximating OLAP queries be- comes critical. Answering range queries is one of the primary tasks of OLAP applications. However, datasets tend to be very large in real data warehousing systems. Thus, answering aggregate queries can be computation- ally expensive. To address this issue, providing approxi- mate answers to online queries efficiently is an essential means for cost/performance reasons. In particular, one kind of data cube, called the OP-cube (operational data cube), is widely applied in information systems. An OP-cube is a data cube that stores information for operational purposes rather than demographic data. Examples of OP-cubes include the sales of bookstores or department stores aggregated by different branches, product categories and time intervals