An Evaluation of Regular Path Expressions with Qualifiers
against XML Streams
Dan Olteanu, Tobias Kiesling, Franc ¸ois Bry
Institute for Computer Science, University of Munich, Germany
{olteanu,kiesling,bry}@informatik.uni-muenchen.de
Abstract
This paper presents SPEX, a streamed and progressive
evaluation of regular path expressions with XPath-like qual-
ifiers against XML streams. SPEX proceeds as follows.
An expression is translated in linear time into a network
of transducers, most of them having 1-DPDT equivalents.
Every stream message is then processed once by the entire
network and result fragments are output on the fly. In most
practical cases SPEX needs a time linear in the stream size
and for transducer stacks a memory quadratic in the stream
depth. Experiments with a prototype implementation point
to a very good efficiency of the SPEX approach.
1 Motivation
Querying data streams is motivated by applications like
real time measurements and continuous services which se-
lect informations from continuous streams of data, e.g.
stock exchange or meteorology data. For a selective dissem-
ination of information, streams have to be filtered according
to complex queries before being distributed to subscribers
[2]. To integrate data over the Internet, particularly from
sources with low throughput, it is desirable to progressively
process the data before the full stream is retrieved [5]. Fur-
thermore, the data streams considered in such applications
can be infinite. Thus, traditional querying approaches based
on parsing and buffering are not applicable. The messages
of a data stream are conveniently modeled with XML and
message selection is naturally expressed using regular path
expressions with qualifiers.
2 Overview
SPEX stands for a streamed and progressive evaluation
of regular path expressions against wellformed XML
streams. Streamed evaluation means that a data stream is
not completely buffered, progressive processing means that
results are streamed and delivered on the fly.
XML Streams and Query Language. Streaming an
XML document corresponds to a traversal of the XML
document in document order, i.e. a preorder traversal of
the document tree. The document tree nodes correspond
to stream messages. SPEX provides support for query-
ing XML streams by means of regular path expressions
[1] with qualifiers like those of XPath [9]. More pre-
cisely, the query language processed by SPEX subsumes
the XPath fragment represented by child and descen-
dant forward steps, union and intersection set operations
and multiple and nested qualifiers. The qualifiers, i.e.
value comparisons and structural conditions, do not cre-
ate result, but rather condition the result. Any expres-
sion is allowed as a structural condition. E.g. the expres-
sion root. *.a[a][b[c=’text’]].c, where * is a
wilcard closure step, selects all c messages that are chil-
dren of a messages that have (at least) an a child and a b
child with a c child that has the value text. Furthermore,
the backward steps ancestor and parent are treated.
As established in [8], they are expressible in the aforemen-
tioned query language fragment. The addition of variables
proves to be straightforward [7].
Translation to SPEX Networks. For each regular path
expression construct, e.g. a step or a structural condition,
a SPEX pushdown transducer is defined. A SPEX trans-
ducer is similar to a conventional deterministic pushdown
transducer (DPDT), except that it does not have accepting
states and that it has two stacks, i.e. it is a 2-DPDT. How-
ever, both stacks are updated in a synchronized manner, and
most SPEX transducers can be reduced to 1-DPDT [7].
A regular path expression is translated into a network of
interconnected SPEX transducers. A SPEX network is a di-
rected acyclic graph (DAG), where each node consists in a
SPEX transducer and an edge relates an output to an input
tape of two successive transducers. The communication be-
tween transducers is done by having a transducer writing a
message on its output tape, the next transducer reading that
message from its input tape. The messages that stream into
702
Proceedings of the 19th International Conference on Data Engineering (ICDE’03)
1063-6382/03 $ 17.00 © 2003 IEEE
Authorized licensed use limited to: West Virginia University. Downloaded on June 26, 2009 at 12:01 from IEEE Xplore. Restrictions apply.