Research Article
A Hybrid Algorithm for Clustering of Time Series
Data Based on Affinity Search Technique
Saeed Aghabozorgi, Teh Ying Wah, Tutut Herawan, Hamid A. Jalab,
Mohammad Amin Shaygan, and Alireza Jalali
Faculty of Computer Science & Information Technology Building, University of Malaya, 50603 Kuala Lumpur, Malaysia
Correspondence should be addressed to Saeed Aghabozorgi; saeed@um.edu.my
Received 4 October 2013; Accepted 2 February 2014; Published 25 March 2014
Academic Editors: H. Chen, P. Ji, and Y. Zeng
Copyright © 2014 Saeed Aghabozorgi et al. his is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
Time series clustering is an important solution to various problems in numerous ields of research, including business, medical
science, and inance. However, conventional clustering algorithms are not practical for time series data because they are essentially
designed for static data. his impracticality results in poor clustering accuracy in several systems. In this paper, a new hybrid
clustering algorithm is proposed based on the similarity in shape of time series data. Time series data are irst grouped as subclusters
based on similarity in time. he subclusters are then merged using the k-Medoids algorithm based on similarity in shape. his model
has two contributions: (1) it is more accurate than other conventional and hybrid approaches and (2) it determines the similarity in
shape among time series data with a low complexity. To evaluate the accuracy of the proposed model, the model is tested extensively
using syntactic and real-world time series datasets.
1. Introduction
Clustering is considered the most important unsupervised
learning problem. he clustering of time series data is
particularly advantageous in exploratory data analysis and
summary generation. Time series clustering is also a pre-
processing step in either another time series mining task or
as part of a complex system. Researchers have shown that
using well-known conventional algorithms in the clustering
of static data, such as partitional and hierarchical clustering,
generates clusters with an acceptable structural quality and
consistency and is partially eicient in terms of execution
time and accuracy [1]. However, classic machine learning and
data mining algorithms are inefective with regard to time
series data because of the unique structure of time series,
that is, its high dimensionality, very high feature correlation,
and (typically) large amount of noise [2–4]. Accordingly,
numerous research eforts have been conducted to present
an eicient approach to time series clustering. However, the
focus on the eiciency and scalability of these methods in
handling time series data has come at the expense of losing
the usability and efectiveness of clustering [5].
he clustering of time series data can be broadly clas-
siied into conventional approaches and hybrid approaches.
Conventional approaches employed in the clustering of
time series data are typically partitioning, hierarchical, or
model-based algorithms. In hierarchical clustering, a nested
hierarchy of similar objects is constructed based on a
pairwise distance matrix [6]. Hierarchical clustering has
great visualization power in time series clustering [7]. his
characteristic has made hierarchical clustering very suitable
for time series clustering [8, 9]. Additionally, hierarchical
clustering does not require the number of clusters as an initial
parameter, in contrast to most algorithms. his characteristic
is a well-known and outstanding feature of this algorithm
and is a strength point in time series clustering because
deining the number of clusters is oten diicult in real-world
problems. However, hierarchical clustering is cumbersome
when handling large time series datasets [10] because of
its quadratic computational complexity. As a result of its
poor scalability, hierarchical clustering is restricted to small
datasets. On the other hand, partitioning algorithms, such
as the well-known k-Means [11] or k-Medoids algorithm
[12], are among the most used algorithms in this domain.
Hindawi Publishing Corporation
e Scientific World Journal
Volume 2014, Article ID 562194, 12 pages
http://dx.doi.org/10.1155/2014/562194