J Intell Inf Syst (2014) 42:531–566
DOI 10.1007/s10844-013-0290-3
TS-stream: clustering time series on data streams
C´ assio M. M. Pereira · Rodrigo F. de Mello
Received: 3 July 2013 / Revised: 13 November 2013 / Accepted: 15 November 2013 /
Published online: 1 December 2013
© Springer Science+Business Media New York 2013
Abstract The current ability to produce massive amounts of data and the impossibility
in storing it motivated the development of data stream mining strategies. Despite the pro-
posal of many techniques, this research area still lacks in approaches to mine data streams
composed of multiple time series, which has applications in finance, medicine and science.
Most of the current techniques for clustering streaming time series have a serious limitation
in their similarity measure, which are based on the Pearson correlation. In this paper, we
show the Pearson correlation is not capable of detecting similarities even for classic time
series models, such as those by Box and Jenkins. This limitation motivated our proposal to
cluster streaming time series based on their generating functions, which is achieved by con-
sidering features obtained using descriptive measures, such as Auto Mutual Information,
the Hurst Exponent and several others. We present a new tree-based clustering algorithm,
entitled TS-Stream, which uses the extracted features to produce partitions in better accor-
dance to the time series generating functions. Experiments with synthetic data sets confirm
TS-Stream outperforms ODAC, currently the most popular technique, in terms of clustering
quality. Using real financial time series from the NYSE and NASDAQ, we conducted stock
trading simulations employing TS-Stream to support the creation of diversified investment
portfolios. Results confirmed TS-Stream increased the monetary returns in several orders
of magnitude when compared to trading strategies simply based on the Moving Average
Convergence Divergence financial indicator.
Keywords Data streams · Clustering · Time series · Decision trees
C. M. M. Pereira () · R. F. de Mello
Institute of Mathematical and Computer Sciences–ICMC–USP,
University of Sao Paulo, Av. Trabalhador s˜ ao-carlense, S˜ ao Carlos 400 13566-590, SP, Brazil
e-mail: cpereira@icmc.usp.br
R. F. de Mello
e-mail: mello@icmc.usp.br