On Redundant Data for Faster Recursive Querying Via ORM Systems Aleksandra Boniewicz Faculty of Mathematics and Computer Sciences Nicolaus Copernicus University Toru´ n, Poland Email:grusia@mat.umk.pl Piotr Wi´ sniewski Faculty of Mathematics and Computer Sciences Nicolaus Copernicus University Toru´ n, Poland Email:pikonrad@mat.umk.pl Krzysztof Stencel Institute of Informatics University of Warsaw Warsaw, Poland Email:stencel@mimuw.edu.pl Abstract—Persistent data of most business applications contain recursive data structures, i.e. hierarchies and networks. Process- ing such data stored in relational databases is not straightfor- ward, since the relational algebra and calculus do not provide adequate facilities. Therefore, it is not surprising that initial SQL standards do not contain recursion as well. Although it was introduced by SQL:1999, even now it is implemented in few selected database management systems. In particular, one of the most popular DBMSs (MySQL) does support recursive queries yet. Numerous classes of queries can be accelerated using redundant data structures. Recursive queries form such a class. In this paper we consider four materialization solutions that speed up recursive queries. Three of them belong to the state-of-the-art, while the fourth one is the contribution of this paper. The latter method assures that the required redundant storage is linearithmic. The other methods do not guarantee such a limitation. We also present thorough experimental evaluation of all these solutions using data of various sizes up to million records. Since all these methods require writing complex code if applied directly, we have prototyped an integration of them into Hibernate object-relational mapping system. This way all the peculiarities are hidden from application developers. Architects can simply choose the appropriate materialization method and record their decisions in configuration files. All necessary routines and storage objects are then generated automatically by the ORM layer. I. I NTRODUCTION Data models of numerous business enterprises encompass recursive data structures in the form of hierarchies and net- works. They store data on e.g. railway networks, bill of ma- terial and product categorization. Their actual storage format can be chosen from a plethora of proposals [1]. There are various ways to query such data. Obviously, a dedicated 3GL client code can be written. Then, the data processing is done on the client side. In this case a significant amount of complex source code must be created, debugged and maintained. This usually causes a noteworthy increase of the budget and a shift in the delivery schedule. Therefore, a server side solution is called for. It was proposed as extensions to SQL, e.g. Oracle’s CONNECT BY clause or recursive Common Table Expressions eventually adopted in SQL:1999. Such extensions have been implemented in numerous database systems [2]. This work was supported by the Polish National Science Centre grants 2011/01/B/ST6/03867 Simultaneously the academia worked on optimization meth- ods for such queries [3], [4], [5]. However, there are still database managements systems that do not support recursion in queries, e.g. MySQL. Since they are widely adopted and used, applications programmers often face the question how to query their recursive data. As noted above, they can choose to hardcode suitable logic in the application. In spite of deceptive simplicity of this solutions, it causes merely troubles: lower efficiency, increased cost and complexity, as well as reduced maintainability. On the other hand, object-relational mapping systems (ORM) [6], [7] are a possible way to solve the above prob- lem. They bridge the gap between data models of relational storage and object-oriented code [8], [9]. Besides this basic functionality, they also establish a thick abstraction layer that can be augmented with abundant features. In our research, we have prepared proof-of-concept extensions to Hibernate that realize recursive queries [10], [11], [12], partial aggregation [13] and functional indices [14]. In particular, we experi- mented with adding recursion on top of database systems that do not implement it directly [15]. In order to accelerate processing recursive queries in such a setting we proposed adding redundant data. In this paper we describe another format of redundant data called logarithmic paths. Its advantage lays in its linearithmic size, while most state-of-the-art methods possibly lead to squared space complexity. We also describe our proof-of- concept implementation of this new method and three known techniques to build redundant data that facilitate recursive querying. They are nested sets, materialized paths and full paths. We show results of extensive performance experiments to verify the quality of these solutions. They have shown that there is no dominating method. All of them have advantages and disadvantages. We summarize them and present recom- mendations when each of them seems to be the most suitable. The contributions of this paper are as follows: • a novel (linearithmic in space) method to build redundant data that accelerate recursive querying, • a proof-of-concept implementation of this method in Hibernate assisted with the implementation of three state- of-the-art methods,