Efficient Support for Ordered XPath Processing in Tree-Unaware Commercial Relational Databases Boon-Siew Seah 1,2 Klarinda G. Widjanarko 1,2 Sourav S. Bhowmick 1,2 Byron Choi 1 Erwin Leonardi 1,2 1 School of Computer Engineering, Nanyang Technological University, Singapore 2 Singapore-MIT Alliance, Nanyang Technological University, Singapore {821123145823,klarinda,assourav,kkchoi,lerwin}@ntu.edu.sg May 23, 2007 Abstract In this paper, we present a novel ordered XPATH evaluation in tree-unaware RDBMS. The novelties of our approach lies in the followings. (a) We propose a novel XML storage scheme which comprises only leaf nodes, their corresponding data values, order encodings and their root-to-leaf paths. (b) We propose an algorithm for mapping ordered XPATH queries into SQL queries over the storage scheme. (c) We propose an optimization technique that enforces all mapped SQL queries to be evaluated in a “left-to-right” join order. By employing these techniques, we show, through a comprehensive experiment, that our approach not only scales well but also performs better than some representative tree-unaware approaches on more than 65% of our benchmark queries with the highest observed gain factor being 1939. In addition, our approach reduces significantly the performance gap between tree-aware and tree-unaware approaches and even outperforms a state-of-the-art tree-aware approach for certain benchmark queries. 1 Introduction With the rapid emergence of XML as the de facto standard for exchanging data on the Web, the interest in efficiently querying growing XML data sources has increased. One of the salient features of XML data is that it is order-sensitive. Supporting an ordered data model of XML as well as ordered XML queries, ordered XPATH axes and position predicates in particular, have been the key to successful XML applications, e.g., [12]. In this paper, we present a novel approach to efficiently evaluate ordered XPATH queries in a relational database. Current approaches for evaluating XPATH expressions in relational databases can be arguably categorized into two representative types. They either resort to encoding XML data as tables and translating XML queries into relational queries [3, 4, 5, 6, 10, 11, 15] or store XML data as a rich data type and process XML queries by enhancing the relational infrastructure [9]. The former approach can further be classified into two representative types. Firstly, a host of work on processing XPATH queries on tree-unaware relational databases has been reported [5, 10, 11] – these approaches do not modify the database kernels. Secondly, there have been several efforts on enabling relational databases to be tree-aware by invading the database kernel to implement XML support [3, 4, 6, 15]. It has been shown that the latter approaches appear scalable and, in particular, perform orders of magnitude faster than some tree-unaware approaches [3, 6]. In this paper, we focus on supporting ordered XPATH evaluation in a tree-unaware relational environment. There is a considerable benefit in such an approach with respect to portability and ease of implementation on top of an off-the-shelf RDBMS. Although a diverse set of strategies for evaluating XML queries in tree-unaware relational environment have been recently proposed, few have undertaken a comprehensive study on evaluating ordered XPATH queries. Tatarinov et al. [12] is the first to show that it is indeed possible to support ordered