Combination of Similarity Measures for Time
Series Classification using Genetic Algorithms
Deepti Dohare and V. Susheela Devi
Department of Computer Science and Automation
Indian Institute of Science, India
{deeptidohare, susheela}@csa.iisc.ernet.in
Abstract—Time series classification deals with the problem
of classification of data that is multivariate in nature. This
means that one or more of the attributes is in the form of
a sequence. The notion of similarity or distance, used in time
series data, is significant and affects the accuracy, time, and
space complexity of the classification algorithm. There exist
numerous similarity measures for time series data, but each of
them has its own disadvantages. Instead of relying upon a single
similarity measure, our aim is to find the near optimal solution
to the classification problem by combining different similarity
measures. In this work, we use genetic algorithms to combine
the similarity measures so as to get the best performance. The
weightage given to different similarity measures evolves over a
number of generations so as to get the best combination. We test
our approach on a number of benchmark time series datasets
and present promising results.
I. I NTRODUCTION
Time series data are ubiquitous, as most of the data is in
the form of time series, for example, stocks, annual rainfall,
blood pressure, etc. In fact, other forms of data can also be
meaningfully converted to time series including text, DNA,
video, audio, images, etc [1]. It is also evident that there has
been a strong interest in applying data mining techniques to
time series data.
The problem of classification of time series data is an
interesting problem in the field of data mining. The need to
classify time series data occurs in broad range of real-world
applications like medicine, science, finance, entertainment, and
industries. In cardiology, ECG signals (an example of time
series data) are classified in order to see whether the data
comes from a healthy person or from a patient suffering from
heart disease [2]. In anomaly detection, users’ system access
activities on Unix system are monitored to detect any kind
of abnormal behavior [3]. In information retrieval, different
documents are classified into different topic categories which
has been shown to be similar to time series classification [4].
Another example in this respect is the classification of signals
coming either from nuclear explosions or from earthquakes,
in order to monitor a nuclear test ban treaty [5].
Generally, a time series t = t
1
, ..., t
r
, is an ordered set
of r data points. Here the data points, t
1
, ..., t
r
, are typically
measured at successive point of time spaced at uniform time
intervals. A time series may also carry a class label. The
problem of time series classification is to learn a classifier
C , which is a function that maps a time series t to a class
label l, that is, C (t)= l where l ∈ L, the set of class labels.
The time series classification methods can be divided into
three large categories. The first is the distance based clas-
sification method which requires a measure to compute the
distance or similarity between pairs of time sequences [6]–[8].
The second is the feature based classification method which
transforms each time series data into a feature vector and
then applies conventional classification method [9], [10]. The
third is the model based classification methods where a model
such as Hidden Markov Model (HMM) or any other statistical
model is used to classify time series data [11], [12].
In this paper, we consider the distance based classification
method where the choice of the similarity measure affects
the accuracy, as well as the time and the space complexity
of classification algorithms [6]. There exist some similarity
measures for time series data, but each of them has their
own disadvantages. Some well known similarity measures for
time series data are Euclidean distance, Dynamic time warping
distance (DTW), Longest Common Subsequence (LCSS) etc.
We introduce a similarity based time series classification algo-
rithm that uses the concept of genetic algorithms. One nearest
neighbor (1NN) classifier has often been found to perform
better than any other method for time series classification [7].
Due to the effectiveness and the simplicity of 1NN classifier,
we focus on combining different similarity measures into one
and use the resultant similarity measure with 1NN classifier.
The paper is organized as follows: We present a brief survey
of the related work in Section II. We formally define our
problem in Section III. In Section IV, we describe the proposed
genetic approach for the time series classification. Section V
presents the experimental evaluation. Results are shown in
Section VI. Finally, we conclude in Section VII.
II. RELATED WORK AND MOTIVATION
We begin this section with a brief description of the dis-
tance based classification method. The distance based method
requires a similarity measure or a distance function, which
is used with some existing classification algorithms. In the
current literature, there are over a dozen distance measures
for finding the similarity of time series data. Although many
algorithms have been proposed providing a new similarity
measure as a subroutine to 1NN classifier, it has been shown
401 978-1-4244-7835-4/11/$26.00 ©2011 IEEE