Significance-Based Failure and Interference Detection in Data Streams Nickolas J.G. Falkner and Quan Z. Sheng School of Computer Science, The University of Adelaide Adelaide, SA 5005, Australia {jnick,qsheng}@cs.adelaide.edu.au Abstract. Detecting the failure of a data stream is relatively easy when the stream is continually full of data. The transfer of large amounts of data allows for the simple detection of interference, whether accidental or malicious. However, during interference, data transmission can become irregular, rather than smooth. When the traffic is intermittent, it is harder to detect when failure has occurred and may lead to an application at the receiving end requesting retransmission or disconnecting. Request retransmission places additional load on a system and disconnection can lead to unnecessary reversion to a checkpointed database, be- fore reconnecting and reissuing the same request or response. In this paper, we model the traffic in data streams as a set of significant events, with an arrival rate distributed with a Poisson distribution. Once an arrival rate has been determined, over-time, or lost, events can be determined with a greater chance of reliability. This model also allows for the alteration of the rate parameter to reflect changes in the system and provides support for multiple levels of data aggregation. One significant benefit of the Poisson-based model is that transmission events can be deliberately manipulated in time to provide a steganographic channel that con- firms sender/receiver identity. 1 Introduction The extensive use of sensor networks and distributed data gathering systems has in- creased both the rate and quantity of data that is delivered to receiving and processing nodes. Rather than processing a finite number of static records at a computationally- convenient time, data streams represent the fluctuating and potentially continuous flow of data from dynamic sources [1,2,3]. Data streams provide a rich and challenging source of data, with pattern identification and structure extraction providing important business knowledge for scientific, financial and business applications. Several challenges occur when processing a data stream. Current data stream man- agement techniques focus on a continuously generated stream of information and project analysis windows onto this stream, based on time intervals. While this works for a large number of applications, applications that wish to minimise onwards transmission based on the importance of data values may produce a data stream that appears discontinuous or fragmented. While a stream may be seen to never terminate, with a continuous flow of data, at a given time there may be no transmission activity within the data stream. Al- though the data stream may represent an abstract continuous data feed, it is implemented S.S. Bhowmick, J. K¨ ung, and R. Wagner (Eds.): DEXA 2009, LNCS 5690, pp. 645–659, 2009. c Springer-Verlag Berlin Heidelberg 2009