Efficient Query Processing on Relational Data-Partitioning Index Structures Hans-Peter Kriegel, Peter Kunath, Martin Pfeifle, Matthias Renz University of Munich, Germany, {kriegel, kunath, pfeifle, renz}@dbs.informatik.uni-muenchen.de Abstract In contrast to space-partitioning index structures, data-partitioning index structures naturally adapt to the ac- tual data distribution which results in a very good query re- sponse behavior. Besides efficient query processing, mod- ern database applications including computer-aided de- sign, medical imaging, or molecular biology require fully-fledged database management systems in order to guarantee industrial-strength. In this paper, we show how we can achieve efficient query processing on data-parti- tioning index structures within general purpose database systems. We reduce the navigational index traversal cost by using “extended index range scans”. If a directory node is “largely” covered by the actual query, the recursive tree traversal for this node can beneficially be replaced by a scan on the leaf level of the index instead of navigating through the directory any longer. On the other hand, for highly selective queries, the index is used as usual. In this paper, we demonstrate the benefits of this idea for spatial collision queries on the Relational R-tree. Our experiments with an Oracle9i database system show that our new ap- proach outperforms common index structures and the se- quential scan considerably. 1. Introduction The efficient management of complex objects has be- come an enabling technology for many novel database ap- plications, including computer aided design (CAD), medi- cal imaging or molecular biology. For commercial use, a seamless and capable integration of spatial indexing into in- dustrial-strength databases is essential. In contrast to a que- ry optimizer of an ORDBMS which has to decide “once and for all” whether to include a specific access method into the execution plan, the approach of this paper is much more fine-grained. At each directory node of a hierarchical index structure it is individually decided whether it is beneficial to switch to a range scan on the leaf level of the index or whether it is beneficial to take further advantage of the in- dex-directory. The experiments show that our new approach always adapts to the best of the two worlds “index” and “se- quential scan”. Therefore, the optimizer can under all cir- cumstances include our new approach into the query execu- tion plan. In an ORDBMS, the user has no access to the exact information where the blocks are located on the disk. Former approaches which try to generate efficient read schedules for a given set of disk pages [4] must know the actual position of the pages on the storage media. In this paper, we introduce a new approach based on index inherent statistics on top of an ORDBMS exemplarily for spatial intersection queries performed on the Relational R-tree. In [2] it is also shown how our approach can be adapted to similarity range queries and k-nn queries on the Relational M-tree. For more details about relational index- ing, we refer the reader to [1]. The Relational R-tree. In this paragraph, we shortly introduce Relational R-trees, like they have been used by the Oracle developers Ravi Kanth et al. [3]. Figure 1 depicts a hierarchical R-tree along with a possible relational map- ping (page_id, page_lev , son_id, son_data). The column page_id contains the logical page identifier, while page_lev denotes its level in the tree. Thereby, 0 marks the leaf level of the directory. The attribute son_id contains the page_id of the connected entry, while son_data stores its minimum bounding rectangle. To support the navigation through the R-tree table at query time, a built-in index can be created on the page_id column. The remainder of this paper is organized as follows. In Section 2, we present our new indexing approach which combines the advantages of a recursive tree traversal and a sequential scan. In Section 3, we present experimental re- sults and conclude the paper with a few remarks on future work in Section 4. Figure 1: Relational mapping of an R-tree directory a) Hierarchical directory, b) Index table polygons_rtree page id page_lev son_id son_data (MBR) ROOT 2 1 BOX((0,0),(200,120)) 1 1 2 BOX((0,0),(80,60)) 1 1 3 BOX((60,20),(100,120)) 1 1 4 BOX((140,20),(200,120)) 2 0 A … 2 0 B … … … … … 1 2 3 4 A B C D E … A B a) b)