Warping the Time on Data Streams Paolo Capitani 1 , Paolo Ciaccia 1 1 DEIS - IEIIT-BO/CNR, University of Bologna, Italy {pcapitani,pciaccia}@deis.unibo.it Abstract. Continuously monitoring through time the correlation/distance of multiple data streams is of interest in a variety of applications, including finan- cial analysis, video surveillance, and mining of biological data. However, dis- tance measures commonly adopted for comparing time series, such as Euclidean and Dynamic Time Warping (DTW ), either are known to be inaccurate or are too time-consuming to be applied in a streaming environment. In this paper we propose a novel DTW -like distance measure, called SDTW , which, unlike DTW , can be efficiently updated at each time step and experimentally show that it improves over DTW by orders of magnitude without sacrificing accu- racy. For instance, with a sliding window of 512 samples, SDTW is 400 times faster than DTW . 1. Introduction Management of data streams has recently emerged as one of the most challenging ex- tensions of database technology. The proliferation of sensor networks as well as the availability of massive amounts of streaming data related to telecommunications traffic monitoring, web-click logs, geophysical measurements and many others, has motivated the investigation of new methods for their modelling, storage, and querying. In particu- lar, continuously monitoring through time the correlation of multiple data streams is of interest in order to detect similar behaviors of stock prices, for video surveillance appli- cations, synchronization of biological signals and, more in general, mining of temporal patterns [Lin et al. 2002, Roddick et al. 2000]. Previous works dealing with the problem of detecting when two or more streams exhibit a high correlation in a certain time interval have tried to extend techniques de- veloped for (static) time series to the streaming environment. In particular, Zhu and Shasha [Zhu and Shasha 2002], by adopting a sliding window model and the Euclidean distance as a measure of correlation (low distance = high correlation), have been able to monitor in real-time up to 10,000 streams on a PC. However, it is known that for time-varying data a much better accuracy can be obtained if one uses the Dynamic Time Warping (DTW ) distance [Berndt and Clifford 1994]. Since the DTW can compensate for stretches along the temporal axis, it provides a way to optimally align time series that matches user’s intuition of similarity much better than Euclidean distance does, as demonstrated several times (see, e.g., [Ratanamahatana and Keogh 2004] for some recent DTW applications and [Bartolini et al. 2005] for a novel DTW -based approach to shape matching). Further, although not a metric, DTW can be indexed [Keogh 2002], which allows this distance to be applied also in the case of large time series archives. Unfortunately, when considering streams the benefits of DTW seem to vanish since, unlike Euclidean distance, it cannot be efficiently updated. The basic reason is that