Self-Adaptive Event Recognition for Intelligent Transport Management Alexander Artikis 1 , Matthias Weidlich 2 , Avigdor Gal 2 , Vana Kalogeraki 3 and Dimitrios Gunopulos 4 1 Institute of Informatics & Telecommunications, NCSR Demokritos, Athens, Greece, a.artikis@iit.demokritos.gr 2 Technion - Israel Institute of Technology, Haifa, Israel, {weidlich@tx, avigal@ie}.technion.ac.il 3 Department Informatics, Athens University of Economics and Business, Greece, vana@aueb.gr 4 Department of Informatics and Telecommunications, University of Athens, Greece, dg@di.uoa.gr Abstract—Intelligent transport management involves the use of voluminous amounts of uncertain sensor data to identify and effectively manage issues of congestion and quality of service. In particular, urban traffic has been in the eye of the storm for many years now and gathers increasing interest as cities become bigger, crowded, and “smart”. In this work we tackle the issue of uncertainty in transportation systems stream reporting. The variety of existing data sources opens new opportunities for testing the validity of sensor reports and self-adapting the recognition of complex events as a result. We report on the use of a logic-based event reasoning tool to identify regions of uncertainty within a stream and demonstrate our method with a real-world use-case from the city of Dublin. Our empirical analysis shows the feasibility of the approach when dealing with voluminous and highly uncertain streams. Keywords-event processing; pattern matching; event calculus I. I NTRODUCTION Detecting complex event patterns from multiple, highly uncertain data streams is a promising vehicle to support Big Data applications for monitoring, detection, and online response [4], [9]. Consider, for example, an urban monitoring system that identifies road congestions and responds by applying local changes to traffic light control policies to reduce ripple effects. Such a system may use events that report on the flow in junctions together with reports from buses to collect evidence of congestions in-the-make. Two of the main challenges when dealing with Big Data are that of variety and veracity. Data, arriving from multiple heterogeneous sources, may be of poor quality and in general requires pre-processing and cleaning when used for analytics and query answering. In particular, sensor networks introduce uncertainty into the system due to reasons that range from inaccurate measurements through network local failures to unexpected interference of mediators. While the first two reasons are well recorded in the literature, the latter is a new phenomenon that stems from the combination of variety and sensor data. Sensor data may go through multiple mediators en route to our systems. Such mediators apply various filtering and aggregation mechanisms, most of which are unknown to the system that receives the data. Hence, the uncertainty that is inherent to sensor data is multiplied by the factor of unknown aggregation and filtering treatments. In this work we outline the principle of using variety to effectively handle veracity. In a nutshell, streams from multiple sources are used to generate common complex events. These events are matched against each other to identify mismatches that indicate uncertainty regarding the event streams. Temporal regions of uncertainty are identified from which point the monitoring system autonomously decides on how to manage this uncertainty. At times, complete event intervals are neglected. At other times, a selection mechanisms prefers one stream over the others. Finally, using multiple sources, one can create a distribution over the possible occurrence of events in inconsistent regions. Our tool of choice for this task is the RTEC (Run-Time Event Calculus) event recognition engine [2]. In addition to standard event algebra operators, RTEC has a built- in representation of the law of inertia [7] that makes it particularly useful for expressing rules that dynamically discard noisy event sources and include reliable ones. We illustrate our approach using real, heterogeneous data streams concerning city transport and traffic management. First, we use data from Sydney Coordinated Adaptive Traffic System (SCATS) sensors, that is, fixed sensors mounted on intersections to measure traffic flow. Second, we use bus probe data stating, among others, the location, line and delay of each bus as well as traffic congestions. The voluminous data streams come from the city of Dublin, Ireland, and concern all SCATS sensors of the city and the complete bus fleet. To the best of our knowledge, this is the first approach combining these heterogeneous streams for real- time intelligent transport management. The contributions of the paper are summarized as follows: • At a conceptual level, we show how cross-validating multiple data streams can be used for self-adaptive event processing that enhances stream credibility. • We show how the use of semantics can support reasoning for such cross-validation. • We provide empirical evidence to the feasibility of the proposed approach. Organisation. Section II provides an introduction to complex event processing and discusses related research on uncertain data stream handling. Section III presents the event recognition engine that we use. Section IV demonstrates how to model event patterns for city transport and traffic management. These patterns are used in Section V to demonstrate self-adaptation for noisy data stream handling.