The VLDB Journal (2008) 17:683–702
DOI 10.1007/s00778-006-0032-z
REGULAR PAPER
DAWN: an efficient framework of DCT for data with error estimation
Ming-Jyh Hsieh · Wei-Guang Teng ·
Ming-Syan Chen · Philip S. Yu
Received: 20 October 2004 / Accepted: 10 November 2005 / Published online: 29 September 2006
© Springer-Verlag 2006
Abstract On-line analytical processing (OLAP) has
become an important component in most data ware-
house systems and decision support systems in recent
years. In order to deal with the huge amount of data,
highly complex queries and increasingly strict response
time requirements, approximate query processing has
been deemed a viable solution. Most works in this area,
however, focus on the space efficiency and are unable to
provide quality-guaranteed answers to queries. To rem-
edy this, in this paper, we propose an efficient frame-
work of DCT for dAta With error estimatioN, called
DAWN, which focuses on answering range-sum que-
ries from compressed OP-cubes transformed by DCT.
Specifically, utilizing the techniques of Geometric se-
ries and Euler’s formula, we devise a robust summation
function, called the GE function, to answer range que-
ries in constant time, regardless of the number of data
cells involved. Note that the GE function can estimate
the summation of cosine functions precisely; thus the
quality of the answers is superior to that of previous
works. Furthermore, an estimator of errors based on
the Brown noise assumption (BNA) is devised to pro-
M.-J. Hsieh · W.-G. Teng · M.-S. Chen (B)
Electrical Engineering Department,
National Taiwan University,
Taipei, Taiwan, ROC
e-mail: mschen@cc.ee.ntu.edu.tw
W.-G. Teng
Department of Engineering Science,
National Cheng Kung University,
Tainen city 701, Taiwan, ROC
P. S. Yu
IBM Thomas J.Watson Research Centre, P.O.Box 704,
Yorktown, NY 10598, USA
e-mail: psyu@us.ibm.com
vide tight bounds for answering range-sum queries. Our
experiment results show that the DAWN framework is
scalable to the selectivity of queries and the available
storage space. With GE functions and the BNA method,
the DAWN framework not only delivers high quality
answers for range-sum queries, but also leads to shorter
query response time due to its effectiveness in error
estimation.
1 Introduction
Approximate query processing has recently emerged
as a viable solution for dealing with the huge amount
of data, highly complex queries and increasingly strict
response time requirements that characterize today’s
decision-support-system (DSS) applications. In such sys-
tems, users usually pose very complex queries to the
on-line analytical processing (OLAP) system, which re-
quires complex operations over gigabytes of data and
takes a very long time to produce exact answers. Conse-
quently, the issue of approximating OLAP queries be-
comes critical. Answering range queries is one of the
primary tasks of OLAP applications. However, datasets
tend to be very large in real data warehousing systems.
Thus, answering aggregate queries can be computation-
ally expensive. To address this issue, providing approxi-
mate answers to online queries efficiently is an essential
means for cost/performance reasons.
In particular, one kind of data cube, called the
OP-cube (operational data cube), is widely applied in
information systems. An OP-cube is a data cube that
stores information for operational purposes rather than
demographic data. Examples of OP-cubes include the
sales of bookstores or department stores aggregated by
different branches, product categories and time intervals