Proceedings of the International Multiconference on ISBN 978-83-60810-22-4 Computer Science and Information Technology, pp. 643 – 650 ISSN 1896-7094 Abstract — When a query jointly addresses very large and very small collections it may happen that an iteration caused by a query operator is driven by a large collection and in each cycle it evaluates a subquery that depends on an element of a small collection. For each such element the result returned by the subquery is the same. In effect, such a subquery is unnec- essarily evaluated many times. The optimization rewrites such a query to reverse the situation: the loop is to be performed on a small collection and inside each its cycle a subquery address- ing a large collection is evaluated. We illustrate the method on comprehensive examples and then present the general rewrit- ing rule. The research follows the Stack-Based Approach to query languages having roots in the semantics of program- ming languages. The optimization method consists in analyz- ing of scoping and binding rules for names occurring in queries. I. INTRODUCTION N TWO big European projects, eGov Bus [7] and VIDE [25], we have implemented object-oriented programming languages integrated with database queries. Both imple- mented query languages, SBQL and OCL, are supported by an advanced query optimizer. In this paper we shortly present these projects and describe some of query optimiza- tion methods that we have implemented. We propose a new powerful method that is not presented yet in any source. The method is applicable in situations when a query jointly addresses very large and very small collections. It is gener- alization of previously introduced methods ([6], [18], [19] and [23]). I In the eGov Bus project we have implemented the system ODRA (Object Database for Rapid Application Develop- ment) ([1], [12] and [17]) with a lot of features aiming at business-oriented application programming. In particular, we have implemented SBQL (Stack-Based Query Lan- guage) ([2], [17], [23] and [24]) that evolved from a pure database query language to the fully-fledged object-oriented programming language with a lot of advanced features, such as a UML-like object model, processing semi-struc- tured data, collections constrained by cardinalities, semi- strong static type checking ([8] and [21]), updateable virtual views, transitive closures, fixed-point equations, seamless integration of heterogeneous resources (XML, relational databases, Web Services), and others [17]. OMG considers SBQL as a departure point for the new 4 th generation object database standard for software industry [16] 1 The VIDE project aimed at implementation of the OMG MDA (Model Driven Architecture) [13] paradigm through both visual and textual programming capabilities. VIDE in- troduces several original ideas in comparison to other im- plementations of MDA. The most important novelty is the support for programming on the PIM (Platform Indepen- dent Model) level. Hence we have implemented a PIM-level programming language that can be used to write, test, de- bug and execute business-oriented applications. After devel- oping an application on the PIM level the system can gener- ate a code for the PSM (Platform Specific Model) level. We have provided model compilers for two PSM-s: J2EE and ODRA. The PIM-level programming language is based on UML 2.1 (aka Executable UML) [15] and OCL 2.0 [14]. Originally OCL (Object Constraint Language) has been de- voted to specification of constraints (preconditions and post- conditions), hence it was not the intention of its developers to make from it a database query language. Our implemen- tation is the first attempt to use OCL also in this role. OCL expressions can be used within imperative statements, for instance, they can determine both left and right sides of as- signments. As a query language, OCL must be supported by a powerful query optimizer, otherwise it would be rejected by the users for low performance. This is the reason that we treat query optimization very seriously, both for OCL and for SBQL. Although OCL and SBQL seem to be very different languages (OCL has roots in the formal logic, while SBQL is an extension of the clas- sical line of programming languages) it has appeared that they have a common semantic core. Currently OCL is im- plemented in such a way that OCL queries generate SBQL abstract syntax trees (ASTs). They are then processed by a strong type checker, a query optimizer and a code genera- tor. In effect, all optimization methods that we have devel- oped for SBQL are valid for OCL. In this paper for explana- tion of the optimization methods we use SBQL rather than 1 Since 2006 Polish-Japanese Institute of Information Technology is a member of OMG. 643 Optimization of Object-Oriented Queries Addressing Large and Small Collections Michał Bleja Faculty of Mathematics and Computer Science, Łódź University, Banacha 22, 90-238 Łódź, Poland Email: blejam@math.uni.lodz.pl Krzysztof Stencel Institute of Informatics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland Email: stencel@mimuw.edu.pl Kazimierz Subieta Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008 Warsaw, Poland; Institute of Computer Science PAS, Ordona 21, 01-237 Warsaw, Poland Email: subieta@pjwstk.edu.pl