An Evaluation of Regular Path Expressions with Qualifiers against XML Streams Dan Olteanu, Tobias Kiesling, Franc ¸ois Bry Institute for Computer Science, University of Munich, Germany {olteanu,kiesling,bry}@informatik.uni-muenchen.de Abstract This paper presents SPEX, a streamed and progressive evaluation of regular path expressions with XPath-like qual- ifiers against XML streams. SPEX proceeds as follows. An expression is translated in linear time into a network of transducers, most of them having 1-DPDT equivalents. Every stream message is then processed once by the entire network and result fragments are output on the fly. In most practical cases SPEX needs a time linear in the stream size and for transducer stacks a memory quadratic in the stream depth. Experiments with a prototype implementation point to a very good efficiency of the SPEX approach. 1 Motivation Querying data streams is motivated by applications like real time measurements and continuous services which se- lect informations from continuous streams of data, e.g. stock exchange or meteorology data. For a selective dissem- ination of information, streams have to be filtered according to complex queries before being distributed to subscribers [2]. To integrate data over the Internet, particularly from sources with low throughput, it is desirable to progressively process the data before the full stream is retrieved [5]. Fur- thermore, the data streams considered in such applications can be infinite. Thus, traditional querying approaches based on parsing and buffering are not applicable. The messages of a data stream are conveniently modeled with XML and message selection is naturally expressed using regular path expressions with qualifiers. 2 Overview SPEX stands for a streamed and progressive evaluation of regular path expressions against wellformed XML streams. Streamed evaluation means that a data stream is not completely buffered, progressive processing means that results are streamed and delivered on the fly. XML Streams and Query Language. Streaming an XML document corresponds to a traversal of the XML document in document order, i.e. a preorder traversal of the document tree. The document tree nodes correspond to stream messages. SPEX provides support for query- ing XML streams by means of regular path expressions [1] with qualifiers like those of XPath [9]. More pre- cisely, the query language processed by SPEX subsumes the XPath fragment represented by child and descen- dant forward steps, union and intersection set operations and multiple and nested qualifiers. The qualifiers, i.e. value comparisons and structural conditions, do not cre- ate result, but rather condition the result. Any expres- sion is allowed as a structural condition. E.g. the expres- sion root. *.a[a][b[c=’text’]].c, where * is a wilcard closure step, selects all c messages that are chil- dren of a messages that have (at least) an a child and a b child with a c child that has the value text. Furthermore, the backward steps ancestor and parent are treated. As established in [8], they are expressible in the aforemen- tioned query language fragment. The addition of variables proves to be straightforward [7]. Translation to SPEX Networks. For each regular path expression construct, e.g. a step or a structural condition, a SPEX pushdown transducer is defined. A SPEX trans- ducer is similar to a conventional deterministic pushdown transducer (DPDT), except that it does not have accepting states and that it has two stacks, i.e. it is a 2-DPDT. How- ever, both stacks are updated in a synchronized manner, and most SPEX transducers can be reduced to 1-DPDT [7]. A regular path expression is translated into a network of interconnected SPEX transducers. A SPEX network is a di- rected acyclic graph (DAG), where each node consists in a SPEX transducer and an edge relates an output to an input tape of two successive transducers. The communication be- tween transducers is done by having a transducer writing a message on its output tape, the next transducer reading that message from its input tape. The messages that stream into 702 Proceedings of the 19th International Conference on Data Engineering (ICDE’03) 1063-6382/03 $ 17.00 © 2003 IEEE Authorized licensed use limited to: West Virginia University. Downloaded on June 26, 2009 at 12:01 from IEEE Xplore. Restrictions apply.