BFilter – A XML Message Filtering and Matching Approach in
Publish/Subscribe Systems
Liang Dai Chung-Horng Lung, Shikharesh Majumdar
School of Computer Science Department of Systems and Computer Engineering
Carleton University, Ottawa, Ontario, Canada Carleton University, Ottawa, Ontario, Canada
liang_dai@yahoo.com {chlung, majumdar}@sce.carleton.ca
Abstract - In publish/subscribe systems, XML message filtering
performed at application layer is an important operation for
XML message multicast. As a specific case of content-based
multicast in application layer, XML message multicast depends
on the data filtering and matching processes and the forwarding
and routing schemes. As the XML data emerges in transition,
XML message filtering and matching becomes more and more
desirable. BFilter, proposed in this paper, conducts the XML
message filtering and matching by leveraging branch points in
both the XML document and user query. It evaluates user
queries that use backward matching branch points to delay
further matching processes until branch points match in the
XML document and user query. In this way, XML message
filtering can be performed more efficiently as the probability of
mismatching is reduced. A number of experiments have been
conducted and the results demonstrate that BFilter has better
performance than the well-known YFilter for complex queries.
Keywords - XML; XML message filtering and matching; pub/sub
systems
1. INTRODUCTION
In publish/subscribe (pub/sub) systems or Web services,
application layer multicast is widely used for data
dissemination to subscribers. In pub/sub systems, a subscriber
registers a subscription to the pub/sub service and receives
published messages that match the subscription. Intuitively,
the source (publishers) can allow their subscribers to retain
whatever they want, and send all the data to all subscribers.
This approach is definitely not efficient because there are too
many duplicated data packets.
Generally speaking, there are two ways to carry out
multicast in the context of pub/sub systems [2,4,7,8,9,11,
13,17,22,23]. The first is to find the subscriber by using the
subscription information, and then send appropriate data to
subscribers. Data matching can be performed either at the
source or at some centralized brokers. The second method is to
perform data matching on the fly. In this way, the source
simply pushes the data into the network that has a multicast
tree composed of routers or brokers. The routers or brokers on
the tree have filters to dispatch proper subsets of data to their
children. The children in turn perform data matching and
dispatching and forward the matched data to their children.
This continues until the filtered data reaches the subscribers.
The first approach described above may use keyword-based
multicast [2,11,15,16,18,19,21] or distributed hash table-based
multicast. Distributed hash table-based multicast uses hash
functions to assign keys to subscribers by using their
subscriptions [5]. These methods are efficient in terms of
delivery speed. However, the keyword-based approach is less
expressive because the subscriptions contain only keywords.
The distributed hash table approach is not content-aware. In
these methods, data matching is based on keywords but not the
content. The second approach delivers data according to the
content. The subscription description is used to perform the
matching. The subscription can be presented either in an n-
tuple containing n information spaces, or in XPath expressions
[1,6,13,14]. An XPath expression is used for addressing
portions of a XML file. XPath is more expressive than n-tuple.
A XML file is a tree-based structure for describing
information. As a XML file is structured, it naturally applies
filters in the hierarchy to perform data matching and delivery.
XML-based multicast can properly match and deliver
messages to subscribers. However, because it is difficult to
index and identify the elements in the XML file, the filtering
process in each node is time consuming. Hence, the
performance of XML-based multicast depends heavily on the
approach used to process the XML message.
Several approaches for XML filtering have been reported in
the literature, see Section 2 for details. One common limitation
of those approaches for complex queries that have nested
paths is that complex queries have to be decomposed into sub-
queries and a post-processing task is needed. As a result, the
filtering process becomes inefficient. This paper proposes a
novel XML message filtering algorithm—BFilter. (B
represents branch points.) BFilter realizes the tree structure in
both XML documents and user queries with nested paths. It
conducts the XML message filtering and matching process by
identifying branch points in both XML documents and user
queries. The evaluation of user queries uses backward
matching branch points to delay further matching, so that the
probability of a mismatch is reduced and XML message
filtering can be performed more efficiently.
The rest of the paper is organized as follows. Section 2
presents the background. Section 3 discusses the backward
matching branch point algorithm. Section 4 demonstrates the
some experimental results. Finally, section 5 is the summary.
2. BACKGROUND AND RELATED WORK
There are two important operations performed in a pub/sub
system: XML message filtering and multicast. This paper
focuses on techniques for XML message filtering.
XFilter [1] is based on deterministic finite automata, which
stores user queries and handles each query individually. It is
capable of handling XPath relationship notations, such as
ancestor/descendant (represented by ‘//’ in XPath) as well as
978-1-4244-5637-6/10/$26.00 ©2010 IEEE
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the IEEE Globecom 2010 proceedings.