J.-S. Pan, S.-M. Chen, and N.T. Nguyen (Eds.): ICCCI 2010, Part II, LNAI 6422, pp. 153–162, 2010.
© Springer-Verlag Berlin Heidelberg 2010
A Query Answering Greedy Algorithm for Selecting
Materialized Views
T.V. Vijay Kumar and Mohammad Haider
School of Computer and Systems Sciences,
Jawaharlal Nehru University,
New Delhi-110067,
India
Abstract. Materialized views aim to improve the response time of analytical
queries posed on a data warehouse. This entails that they contain information
that provides answers to most future queries. The selection of such information
from the data warehouse is referred to as view selection. View selection deals
with selection of appropriate sets of views to improve the query response time.
Several view selection algorithms exist in literature, most of them being greedy
based. The greedy algorithm HRUA, which selects top-k views from a multidi-
mensional lattice, is considered the most fundamental greedy based algorithm.
It selects views having the highest benefit, computed in terms of size, for mate-
rialization. Though the views selected using HRUA are beneficial with respect
to size, they may not account for a large number of future queries and may
hence become an unnecessary overhead. This problem is addressed by the
Query Answering Greedy Algorithm (QAGA) proposed in this paper. QAGA
uses both the size of the view, and the frequency of previously posed queries
answered by each view, to compute the profits of all views in each iteration.
Thereafter it selects, from among them, the most profitable view for materiali-
zation. QAGA is able to select views which are beneficial with respect to size
and have a greater likelihood of answering future queries. Further, experimental
results show that QAGA, as compared to HRUA, is able to select views capable
of answering greater number of queries. Though HRUA incurs a lower total
cost of evaluating all the views, QAGA has a lower total cost of answering all
the queries leading to an improvement in the average query response time. This
in turn facilitates decision making.
Keywords: Data Warehouse, Materialized View Selection, Greedy Algorithm,
Query Response Time.
1 Introduction
Large amounts of data are continuously being generated by disparate data sources
spread across the globe. This data needs to be accessed and exploited by organizations
in order to have an edge over their competitors. One way to access this data is by
gathering data from disparate data sources, integrating and storing it in a repository
and then posing queries against the repository. This approach, referred to as the eager