Information Systems 33 (2008) 240–260 Continuous subspace clustering in streaming time series $ Maria Kontaki à , Apostolos N. Papadopoulos, Yannis Manolopoulos Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece Received 9 January 2007; accepted 11 September 2007 Recommended by F. Carino Jr. Abstract Performing data mining tasks in streaming data is considered a challenging research direction, due to the continuous data evolution. In this work, we focus on the problem of clustering streaming time series, based on the sliding window paradigm. More specifically, we use the concept of subspace a-clusters. A subspace a-cluster consists of a set of streams, whose value difference is less than a in a consecutive number of time instances (dimensions). The clusters can be continuously and incrementally updated as the streaming time series evolve with time. The proposed technique is based on a careful examination of pair-wise stream similarities for a subset of dimensions and then it is generalized for more streams per cluster. Additionally, we extend our technique in order to find maximal pClusters in consecutive dimensions that have been used in previously proposed clustering methods. Performance evaluation results, based on real-life and synthetic data sets, show that the proposed method is more efficient than existing techniques. Moreover, it is shown that the proposed pruning criteria are very important for search space reduction, and that the cost of incremental cluster monitoring is more computationally efficient that the re-clustering process. r 2007 Elsevier B.V. All rights reserved. Keywords: Continuous processing; Subspace clustering; Streaming time series; Sliding window 1. Introduction The study of query processing and data mining techniques for data stream processing has recently attracted the interest of the research community [1–3], due to the fact that many applications manage data that change very frequently with respect to time. Examples of such emerging applications are network monitoring, financial data analysis, sensor networks to name a few. The most important property of data streams is that new values are continuously arrive, and therefore efficient storage and processing techniques are required to cope with (usually) high update rates. Due to the highly dynamic nature of data streams, random access is prohibitive. Therefore, each data stream is possible to be read only once (or a limited number of times). This feature poses additional difficulties for query processing, since the data can only be accessed in arrival order. More- over, additional methods are required for data mining tasks, such as clustering and association rule discovery, to cope with the data evolution. A streaming time series s is a sequence of real values s½1, s½2; ... ; where new values are ARTICLE IN PRESS www.elsevier.com/locate/infosys 0306-4379/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.is.2007.09.001 $ Research supported by the PENED 2003 program, funded by the General Secretariat for Research and Technology, Ministry of Development, Greece. à Corresponding author. Tel.: +30 23 10991924; fax: +30 2310991913. E-mail address: kontaki@csd.auth.gr (M. Kontaki).