A Rule Engine for Query Transformation in Starburst and IBM DB2 C/S DBMS Hamid Pirahesh, T.Y. CliffLeung, Waqar Hasan IBM Almaden Research Center {pirahesh,cleung} @almaden.ibm.corn, hasan@informix.com Abstract The complexi~ of queries in relational DBMSs is increas- ing particularly in the decision support area and interactive client server environments. This calls for a more power- ful and flexible optimization of complex queries. In [19] we introduced query rewrite as a distinct query optimiza- tion phase mainly targeted to responding to this require- ment. This approach has enabled us to extensively enrich the optimization rides in our system. Further, it has made it easier to incrementally enrich and adapt the system as need arises. Examples of such query optimizations are predicate pushdown, subquery and magic sets transformations, and decorrelating subqueries. In this paper we describe the de- sign and implementation of a rule engine for query rewrite optimization. Each transformation is implemented as a rule which consists of a pair of rule condition and action [19]. Rules can be grouped into rule classes for higher efficiency, better understandability and more extensibility. The rule engine has a number of novelties in that it supports a full spectrum of control --from totally data-driven to totally procedural. Furthermore, it incorporates a budget control scheme for controlling the resources taken for query opti- mization as well as guaranteeing the termination of rule ex- ecution. The rule engine and a suite of query rewrite rules have been implemented in Starburst relational DBMS pro- totype and a significant portion of this technology has been integrated into IBM DB2 Common Server relational DBMS. 1 Introduction There is an increasing demand on efficient handling of com- plex queries in relational DBMS. Such demands have in- creased rapidly in recent years due to advancement of GUI in query managers. Often, such GUIs automatically gen- erate SQL queries, enabling users to pose very complex queries. Such machine generated queries are not hand- optimized by programmers, hence demanding better query optimization within the DBMS. In Starburst [12] 1, we have implemented a distinct opti- mization phase, called query rewrite, in direct response to this problem. We have chosen a rule based system for query rewrite optimization. Such a system allows us to keep the repertoire of optimization methods open-ended, consider- ably reducing the effort it usually takes to enrich the opti- mization repertoire as the user requirements evolve. Provid- ing such a flexibility must not be at the expense of optimiza- tion time. Often in interactive environments, the queries are ad-hoc, requiring on-the-fly query optimization. This de- mands a high performance rule system since the time spent in query rewrite rule inferencing is part of the query re- sponse time. The growing list of rewrite rules implemented in this system includes predicate pushdown, subquery to join transformation, magic sets transformation, handling of duplicates, merging of views, and decorrelating complex subqueries [16, 17, 15, 19, 22, 18, 4, 23]. These transforma- tions often lead to orders of magnitude improvement in per- formance, making query rewrite an important component of our system. In this paper we describe the design and implementation of a rule engine for query rewrite optimization in Starburst. Each transformation is implemented as a rule which con- sists of a pair of rule condition and action [ 19]. Rules can be grouped into rule classes. We have opted for building a new rule engine so that (1) it is fast enough for interac- tive queries; (2) it can handle the complexity of our rewrite rules; (3) it can suitably be made an integrated component of a relational DBMS architecture. In our design, we chose a rule based architecture because it allows us to achieve an extensible design [3, 7, 2, 8, 11, 9, 1, 5, 10]. For query rewrite, the important advantages for using a rule based system are modifiability (and thus exten- sibility) and explanation. For example, it is important to be able to add a new query rewrite rule into the system without modifying the existing rules, to have an explanation of how the rule system arrives at a solution for a given query, and to influence the way the rule engine does its inferencing for 1In the rest of the paper,a reference to Starburst is implicitly a reference to IBM DB2 CommonServerVersion2 DBMSunlessotherwisenoted. 1063-6382/97 $10.00 © 1997 IEEE 391