Towards Efficient Dissemination and Filtering of
XML Data Streams
Kirill Belyaev
Computer Science Department
Colorado State University
Fort Collins, CO 80523-1873
Email: kirill@cs.colostate.edu
Indrakshi Ray
Computer Science Department
Colorado State University
Fort Collins, CO 80523-1873
Email: iray@cs.colostate.edu
Abstract—The vast amounts of data generated in near real-
time due to prolific use of sensors, pervasive usage of mobile
Internet, and popularity of social media platforms, necessitates
the efficient dissemination of the semi-structured streaming data
to the consuming applications. Towards this end, we introduce
the subscriber-centric XML filtering approach for seamless and
efficient XML stream replication/distribution mechanism. The
subscriber-centric filtering architecture can be configured to
support different topologies in order to support efficient message
filtering for a large number of concurrent subscribers. It allows
selective filtering on the various nodes that improves efficiency
and provides applications with data on a need-to-know basis.
Moreover, it supports interoperability and allows semi-structured
streams generated from multiple sources to be filtered. Our
XML filtering network consists of decoupled data producers,
message transformation agents and XML brokers that can be
deployed in conventional data centers as well as in the public
cloud environment. We provide detailed performance results of
processing filtering queries in several use case scenarios with
varying XML message loads and number of nodes involved in the
replication/dissemination process. Our results indicate that the
subscriber-centric XML filtering architecture is a viable approach
for disseminating semi-structured data streams to the various
consuming applications.
I. I NTRODUCTION
With the increase in the usage of mobile devices that
are connected to the Internet, consumers are subscribing to
various types of applications, such as, Yahoo! Weather, Yahoo!
Finance, and Twitter, that require delivery of streaming data in
a timely manner. There is a need for gathering semi-structured
streaming data from the sources, transforming them to a form
that facilitates interoperability, and then replicating/distributing
the data stream in an efficient manner to the multifarious
applications needing the data for various purposes, such as
forwarding the data to the subscribing consumers and/or per-
forming complex stream analytics to detect trends or outliers.
Publish/subscribe (pub/sub) has been a popular commu-
nication paradigm which provides customized notifications to
users in a distributed environment [1]. Pub/sub systems are
used in geomarketing, traffic and weather alerts, emergency
response services, and social networking. These systems are
large (e.g. Twitter is estimated to handle over 400 million
tweets daily), geographically distributed and largely subscrip-
tion based. Subscriptions in such systems, such as deals for
local shops and traffic alerts for freeways, involve simple
queries and are short-lived. Such pub/sub systems provide very
little query support and trade expressiveness for performance.
However, their inability to express expressive continuous
queries over data streams, possibly in different formats, make
them unsuitable for detecting complex events that arise in
situation monitoring applications.
The majority of modern Internet applications use XML as
an inter-application communication exchange format in spite of
its heavy network bandwidth utilization. Typically, the applica-
tions generate data in XML so that it can be easily distributed
to other applications by operational runtime environments [2]
[3] [4]. XML-based data dissemination networks are starting
to become a reality [4]. Data generated in XML format should
be adapted for efficient streaming, filtering and consumption
by the subscribing applications. We address this issue by
introducing the subscriber-centric XML content filtering
service where each XML message generated or received by
the application layer is transformed into a dissemination-ready
XML message for transport over the network infrastructure.
We propose the TeleScope XML filtering broker [5] in
this paper to carry out the selective dissemination/replication
of XML messages to consuming end-points. Our subscriber-
centric broker has the following characteristics. (i) Fast pro-
cessing of XML messages under high input stream rates and
large number of subscribers – the TeleScope XML filtering
broker is written in C that supports very fast message filtering
speeds even with a large number of concurrent subscribers.
(ii) Content-based XML filtering uses expressive filtering lan-
guage – TeleScope introduces an engine with simple yet effi-
cient user-friendly content filtering domain specific language
parser over XML stream with full support for Boolean logic
operators as well as supplemental operators such as network
prefix range computing operators. The language allows easy
integration with XML consuming applications and does not
require the knowledge of complex XPath/XQuery [6] seman-
tics, but supports the common stream filtering/dissemination
scenarios. (iii) Ability to form the overlay filtering network
for XML dissemination – placing of TeleScope nodes in the
form of the filtering mesh allows efficient dissemination of
XML content to the endpoints.
The rest of the paper is organized as follows. Section II
gives a detailed overview of the XML stream replication for
consuming applications and describes our subscriber-centric
filtering architecture. Section III highlights the XML filtering
framework. Section IV describes the subscribers management
involved in the task of efficient stream dissemination. Section
2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications;
Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing
978-1-5090-0154-5/15 $31.00 © 2015 IEEE
DOI 10.1109/CIT/IUCC/DASC/PICOM.2015.278
1870