# 1569410711 1 Abstract—Event correlation engines help us find events of interest inside raw sensor data streams and help reduce the data volume, simultaneously. This paper discusses some of the challenges faced in finding event correlations over federated wireless sensor networks (WSNs) including high data volumes, uncertain or missing data, application-specific dependencies and widely varying data ranges and sampling frequencies. Analysis over real geo-tracking data of moving objects confirms some of these challenges. Federation at the data layer above the WSNs is presented as a feasible alternative. Index Terms—Event detection, Correlation, Middleware, Wireless Sensor Networks I. INTRODUCTION SN are used for real-time monitoring of physical environments. They help higher-level applications collect relevant data that can be transformed into actionable information. These applications include earthquake monitoring, asset tracking, traffic management, national security, green data centers [21], and recently regulatory hygiene-compliance tracking in hospitals [18]. People managing or using these applications are interested in detecting and even predicting concise “special events” (e.g. anomalies) upon which they can take an application-specific action. Events are semantically different from primitive numeric sensor readings. For example, an event can refer to a numeric threshold violation or a more complex pattern such as an ordered (possibly nested) sequence of any datum. Finding complex event patterns in high-speed, unbounded, bursty data streams can be as challenging as finding a needle in a haystack. Sometimes checking only the “existence” of a simple reading in the stream may be of interest and sometimes we look for the “absence” of an event instead of its existence. Doing these becomes hard when the streams come from distributed sources. Ability to aggregate, order, join or correlate streaming data is the key to detecting many of these complex situations. Event correlation engines, some of which This work was supported in part by the Turkish National Institute of Science and Technology (TUBITAK) under Grant E190194, IBM Shared University Research program, and European Union FP7 Marie Curie Program under Grant BI4MASSES. I. Ari and O. F. Celebi are at Ozyegin University, Istanbul, 34462 Turkey (Corresponding author: ismail.ari@ozyegin.edu.tr, +90-216-559-2331). will be described here, help us describe these scenarios and find event patterns effectively. However, even the state-of-the - art systems cannot cope with today’s real-time and distributed event processing challenges. Assuring accurate environmental monitoring using Federated WSNs (FWSNs) and managing the scale is challenging. If only a few sensors are deployed, then the scale of the event-based application and its potential impact is limited. Also, when the density of the sensors over the area covered decreases the network becomes prone to disconnects. Alternatively, if millions of sensors are deployed (to increase the coverage and measurement accuracy), then both the WSNs and applications are faced with a data deluge, i.e. transferring, storing and near real-time processing of large data volumes. It is impractical to assume a fine-tuned homogeneous deployment model for FWSNs as done in cellular networks, since they are usually used in hostile environments and emergency conditions. If different phenomena are to be measured at different locations, then the cross-organizational nature of the network makes correlations impractical. As a result, while traditional WSNs provide continuous and relatively homogeneous data streams, FWSNs can carry numerous, heterogeneous, and possibly bursty data streams. Other challenges are listed at the end of this section. Currently, many organizations still use Database Management Systems (DBMS) in an ad-hoc fashion to store and query sensor data. They face performance problems with time-window-based analysis over unbounded, high-volume streams, since DBMS architecture was designed for enabling offline analysis. An emerging system architecture called Data Stream Management System (DSMS) allows concurrent analysis over high-speed in-flight data with different continuous queries and is better suited for real-time applications. Complex Event Processing (CEP) middleware built on top of DSMS engines [1] promise a scalable alternative for WSNs data fusion or federation. A. Motivating Scenario Consider the data sample in Table 1 pertaining to only one vehicle collected from a real geo-tracking system. For this vehicle with the unique Id (00-123) the data shows the geo- location (longitude and latitude), speed and time information for every 20 seconds. However, due to intermittent disconnects or noise in the channel many data fields are prone to different types of errors. For example, while a normal value for the longitude and latitude fields would have 8 digits (e.g. Finding Event Correlations in Federated Wireless Sensor Networks Ismail Ari, Ömer F. Çelebi, Member, IEEE W