2016 IEEE International Conference on Big Data (Big Data)
978-1-4673-9005-7/16/$31.00 ©2016 IEEE 410
Real Time Processing of Streaming and Static Information
Christoforos Svingos
1,∗
, Theofilos Mailis
1
, Herald Kllapi
1,2
, Lefteris Stamatogiannakis
1
, Yannis Kotidis
3
, Yannis Ioannidis
1
{csvingos, tmailis, herald, estama, yannis}@di.uoa.gr, kotidis@aueb.gr
1
Dept. of Informatics and Telecomunications, University of Athens, Greece.
2
currently at Google.
3
Dept. of Informatics, Athens University of Economics and Business, Greece.
Abstract—Big Data applications require real-time processing
of complex computations on streaming and static information.
Applications such as the diagnosis of power generating turbines
require the integration of high velocity streaming and large
volume of static data from multiple sources. In this paper we
study various optimisations related to efficiently processing of
streaming and static information. We introduce novel indexing
structures for stream processing, a query-planner component
that decides when their creation is beneficial, and we examine
precomputed summarisations on archived measurements to
accelerate streaming and static information processing. To put
our ideas into practise, we have developed EXASTREAM, a data
stream management system that is scalable, has declarative
semantics, supports user defined functions, and allows efficient
execution of complex analytical queries on streaming and static
data. Our work is accompanied by an empirical evaluation of
our optimisation techniques.
Keywords-Stream Processing, SQL, Static Data, Performance
I. I NTRODUCTION
Emerging Big Data applications require real-time pro-
cessing of complex computations on streaming and static
information. The latter is a challenging task since it involves
the integration of high velocity streaming and large volume
of static data from multiple sources, on many concurrent
continuous queries that need to be executed.
A typical scenario described in [1] requires monitoring
and diagnosing of power-generating turbines. In the de-
scribed scenario, several service centres are dedicated to
diagnosing by utilizing data from more than 100, 000 ther-
mocouple sensors installed in 950 power generating turbines
located across the globe. One typical task of such a centre is
to detect in real-time potential faults of a turbine caused by,
e.g., an undesirable pattern in temperature’s behaviour within
various components of the turbine. This task requires to
extract, aggregate, and correlate (i) streaming data produced
by up to 2, 000 sensors installed in different parts of the
turbine, (ii) static data about the turbine’s structure, (iii) and
historical operational data of each sensor stored in multiple
datasources.
This need has triggered the design of scalable approaches
that provide low latency answering to queries on high-
*
This research has been partially supported by the EU project Optique
(FP7-IP-318338).
velocity live streams and high-volume static data sources.
In this paper we study several novel optimisation techniques
for efficiently processing analytical queries on streaming
& static information. In particular: (i) we introduce novel
in-memory indexing structures and algorithms dedicated to
accelerating stream-processing; (ii) we propose the adaptive
stream indexing technique that is responsible for creating on
the fly the appropriate indexing structures that will accelerate
execution of live-stream operations.
To put our ideas into practice, we have developed the
EXASTREAM Data Stream Management Systems (DSMS),
an experimental DSMS that fuses streaming operators to
the SQLite database engine. EXASTREAM has several sig-
nificant features such as: (i) scalability: the ability to run our
system in a distributed environment and its capacity to easily
add and remove queries without disrupting existing query
execution; (ii) declarative semantics: our system provides
for a declarative language, extending the SQL syntax and
semantics for querying live streams and relations; (iii) user
defined functions: our system natively supports user defined
functions with arbitrary user code; (iv) stream and static
data integration: based on its architecture and implementa-
tion, our system natively supports streaming and static data
integration. It should be noted that the optimisations we have
proposed are general optimisations that can be adopted by
other stream processing systems as well.
In our experimental evaluation we study the effect of
the proposed optimisations in a cloud deployment of EX-
ASTREAM on up to 128 nodes using real sensor data
from power generating turbines. Our findings demonstrate
the effectiveness of our techniques in processing up to 1
thousand live stream queries and performing correlation
analysis between live and archived stream measurements in
real time.
II. SYSTEM OVERVIEW
The EXASTREAM Data Stream Management System
(DSMS) has been designed for efficiently processing on
both static and streaming information. It is embedded in
EXAREME (https://www.exareme.org), a system for elastic
large-scale dataflow processing on the cloud [2], [3] that
has been publicly available as an open source project under
the MIT License. EXASTREAM was implemented as a key