J.-S. Pan, S.-M. Chen, and N.T. Nguyen (Eds.): ICCCI 2010, Part II, LNAI 6422, pp. 153–162, 2010. © Springer-Verlag Berlin Heidelberg 2010 A Query Answering Greedy Algorithm for Selecting Materialized Views T.V. Vijay Kumar and Mohammad Haider School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi-110067, India Abstract. Materialized views aim to improve the response time of analytical queries posed on a data warehouse. This entails that they contain information that provides answers to most future queries. The selection of such information from the data warehouse is referred to as view selection. View selection deals with selection of appropriate sets of views to improve the query response time. Several view selection algorithms exist in literature, most of them being greedy based. The greedy algorithm HRUA, which selects top-k views from a multidi- mensional lattice, is considered the most fundamental greedy based algorithm. It selects views having the highest benefit, computed in terms of size, for mate- rialization. Though the views selected using HRUA are beneficial with respect to size, they may not account for a large number of future queries and may hence become an unnecessary overhead. This problem is addressed by the Query Answering Greedy Algorithm (QAGA) proposed in this paper. QAGA uses both the size of the view, and the frequency of previously posed queries answered by each view, to compute the profits of all views in each iteration. Thereafter it selects, from among them, the most profitable view for materiali- zation. QAGA is able to select views which are beneficial with respect to size and have a greater likelihood of answering future queries. Further, experimental results show that QAGA, as compared to HRUA, is able to select views capable of answering greater number of queries. Though HRUA incurs a lower total cost of evaluating all the views, QAGA has a lower total cost of answering all the queries leading to an improvement in the average query response time. This in turn facilitates decision making. Keywords: Data Warehouse, Materialized View Selection, Greedy Algorithm, Query Response Time. 1 Introduction Large amounts of data are continuously being generated by disparate data sources spread across the globe. This data needs to be accessed and exploited by organizations in order to have an edge over their competitors. One way to access this data is by gathering data from disparate data sources, integrating and storing it in a repository and then posing queries against the repository. This approach, referred to as the eager