IJE TRANSACTIONS B: Applications Vol. 31, No. 2, (February 2018) 250-262 Please cite this article as: A. Salarpour and H. Khotanlou, An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering, International Journal of Engineering (IJE), IJE TRANSACTIONS B: Applications Vol. 31, No. 2, (February 2018) 250-262 International Journal of Engineering Journal Homepage: www.ije.ir An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering A. Salarpour, H. Khotanlou * RIV Lab, Department of Computer Engineering, Bu-Ali Sina University, Hamedan, Iran PAPER INFO Paper history: Received 15 October 2017 Received in revised form 06 December 2017 Accepted 21 December 2017 Keywords: Multivariate Time Series Similarity Measures Clustering Evaluation ABSTRACT Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from lack of comparative studies using quantitative and large scale evaluations. In order to provide a comprehensive validation, an extensive evaluation of similarity measures for MTS data clustering is conducted. Effectivness of fourteen well-known similarity measures and their variants on 23 MTS datasets, coming from a wide variety of application domains, were evaluated experimentaly. In this paper, an overview of these different techniques is given and the empirical comparison regarding their effectiveness based on agglomerative clustering task is presented. Furthermore, the statistical significance tests were used to derive meaningful conclusions. It has been found that all similarity measures are equivalent, in terms of clustering F-measure, and there is no significant difference between similarity measures based on our datasets. The results provide a comparative background between similarity measures to find the most proper method in terms of performance and computation time in this field. doi: 10.5829/ije.2018.31.02b.08 1. INTRODUCTION 1 In the last few years, multivariate time series (MTS) data have been appeared extensively in scientific domains [1, 2] that represent valuable information subject to analysis, clustering, classification, indexing, and interpretation [3-5]. Real-world applications include daily fluctuations of the stock market (financial data analysis[6]), electrocardiogram data mining (medical data processing [7]) and moving object identification (motion data analysis [8]). Even object shapes and handwriting data could be transformed to time series data for further analyzing. In addition, multivariate time series datasets are always embedded with additional information such as class labels, place and time of occurrence [9]. A key concept toward dealing with multivariate time series data is determining their pairwise similarity. In fact, an multivariate time series similarity (or *Corresponding Author’s Email: khotanlou@basu.ac.ir (H. Khotanlou) dissimilarity) measure is a core routine to many data mining [10], retrieval, clustering, and classification tasks [4, 5, 8]. Furthermore, deriving a distance, that correctly captures semantics and reflects underlying similarity of multivariate time series data, is not straightforward. Apart from challenges related to the high dimensionality of such data, calculation of similarity measure requires to be fast and efficient. The generalized framework for the task of time series mining encompasses: data preparation phase which includes sensing that explains the idea of time series data collection from different sources like human, ECG and stock data. Pre-processing step cleans the gathered data from missing values. Primary data representation referres to the methods that are used for representing stored information. Time series analysis is the most important part of the framework that includes similarity measures and analysis techniques. Similarity measure has the responsibility of calculating the similarity between time series data that plays an essential role for further analysis. Analysis section could include many techniques that categorize the time series