Fighting Redundancy in SQL: the For-Loop Approach * Antonio Badia and Dev Anand Computer Engineering and Computer Science department University of Louisville, Louisville KY 40292 July 8, 2004 1 Introduction SQL is the standard query language for relational databases. However, it has some limitations, especially in areas like Decision Support, that have been noted in the literature ([18, 13]). In this paper, we study a class of Decision-Support SQL queries, characterize them and show how to process them in an improved manner. In particular, we analyze queries containing subqueries, where the subquery returns a single result (i.e. it has an aggregate function on its SELECT clause). These are called type-A and type-JA in [17]. In many of these queries, SQL exhibits redundancy in that FROM and WHERE clauses of query and subquery show a great deal of overlap. We argue that these patterns are currently not well supported by relational query processors. In particular, we show that more than one pass over the base relations in the database is necessary in order to compute the answer for such queries with traditional optimization techniques. However, this is not strictly necessary. We call this situation the two-pass problem. The following example gives some intuition about our proposal. Example 1 The TPC-H benchmark ([33]) is a popular reference point for Decision Support; it defines a data warehouse schema and a set of queries. The schema contains two large fact tables and a series of dimension tables which have been normalized (i.e. it’s a snowflake schema). Query 2 is a typical query which shows a great deal of overlap between query and subquery: select s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment from part, supplier, partsupp, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and p_size = 15 and p_type like ’%BRASS’ and r_name = ’EUROPE’ and s_nationkey = n_nationkey and n_regionkey = r_regionkey and ps_supplycost = (select min(ps_supplycost) from partsupp, supplier, nation, region where p_partkey = ps_partkey and s_suppkey = ps_suppkey and s_nationkey = n_nationkey and n_regionkey = r_regionkey and r_name = ’EUROPE’) order by s_acctbal desc, n_name, s_name, p_partkey; This query is executed in most systems by using unnesting techniques. However, the commonality between query and subquery will not be detected, and all operations (including common joins and selections) will be repeated (see an in-depth discussion of this example in subsection 5.1). Our goal is to avoid duplication of effort. * This research was sponsored by NSF under grant IIS-0091928. A full version of this paper is available as a technical report at http://date.spd.louisville.edu/badia/forloop.html. 1