Decomposition of SBQL Queries for Optimal Result Caching Piotr Cybula Institute of Mathematics and Computer Science University of Lodz, Poland Email: cybula@math.uni.lodz.pl Kazimierz Subieta Institute of Computer Science Polish Academy of Sciences, Poland Polish-Japanese Institute of Information Technology, Warsaw, Poland Email: subieta@ipipan.waw.pl Abstract—We present a new approach to optimization of query languages using cached results of previously evaluated queries. It is based on the stack-based approach (SBA) which assumes description of semantics in the form of abstract implementation of query/programming language constructs. Pragmatic universality of object-oriented query language SBQL and its precise, formal operational semantics make it possible to investigate various crucial issues related to this kind of optimization. There are two main issues concerning this topic - the first is strategy for fast retrieval and high reuse of cached queries, the second issue is development of fast methods to recognize and maintain consistency of query results after database updates. This paper is focused on the first issue. We introduce data structures and algorithms for optimal, fast and transparent utilization of the result cache, involving methods of query normalization with preservation of original query semantics and decomposition of complex queries into smaller ones. We present experimental results of the optimization that demonstrate the effectiveness of our technique. I. I NTRODUCTION C ACHING results of previously evaluated queries seems to be an obvious method of query optimization. It as- sumes that there is a relatively high probability that the same query will be issued again by the same or another application, thus instead of evaluating the query the cached result can be reused. There are many cases when such an optimization strat- egy makes a sense. This concerns the environments where data are not updated or are updated not frequently (say, one update for 100 retrieval operations). Examples are data warehouses (OLAP applications), various kinds of archives, operational databases, knowledge bases, decision support systems, etc. Conceptually, the cache can be understood as a two-column table, where one column contains cached queries in some internal format (e.g. normalized syntactic query trees), and the second column contains query results. A query result can be stored as a collection of OIDs, but for special purposes can also be stored e.g. as an XML file enabling further quick reuse in Web applications. A cached query is created as a side effect of normal evaluation of user query. A transparency is the most essential property of a cached query. It implies that programmers need not to involve explicit operations on cached results into an application program. In contrast to other query optimization methods, which strongly depend on the semantics of a particular query, the query caching method is independent of a query type, its complexity and a current database state. Our research is done within the stack-based approach (SBA) to object-oriented query/programming languages. SBA is a formal theory and a universal conceptual frame addressing this kind of languages, thus it allows precise reasoning concerning various aspects of cached queries, in particular, query seman- tics, query decomposition, query indexing in the cache, and so on. We have implemented the caching methods as a part of the optimizer developed for the query language SBQL in our last project ODRA (Object Database for Rapid Application development) devoted to Web and grid applications [1]. In [2] we have described how query caching can be used to enhance performance of applications operating on grids. There are two key aspects concerning the development of database query optimization using cached queries. The first concerns the organization of the cache enabling fast retrieval of cached queries (for optimal queries selection and rewriting new queries with use of cached results) and optimal, fast and transparent utilization of the cache, involving methods of query normalization with preservation of original query semantics (enabling higher reuse of cached queries for semantically equivalent but syntactically different queries), decomposition of complex queries into smaller ones and maintenance of assigned resources by removing rarely used results. The sec- ond problem is development of fast methods to recognize consistency of queries and automatic incremental altering of cached query results after database updates (sometimes removing or re-calculating). In this paper we deal mainly with the first issue of the optimization method. The second aspect is widely researched in [3], [4]. The paper is organized as follows. Section II discusses known solutions that are related to the contributions of the paper. In section III we briefly present the Stack-Based Approach. Section IV shortly describes the architecture of the caching query optimizer. Sections V and VI contain the description of optimization strategies - query normalization, decomposition and rewriting rules. Section VII presents ex- perimental results and Section VIII concludes. Proceedings of the Federated Conference on Computer Science and Information Systems pp. 841–848 ISBN 978-83-60810-22-4 978-83-60810-22-4/$25.00 c 2011 IEEE 841