Information Sciences 442–443 (2018) 186–201 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Simultaneous optimisation of clustering quality and approximation error for time series segmentation Antonio Manuel Durán-Rosal a,* , Pedro Antonio Gutiérrez a , Francisco José Martínez-Estudillo b , César Hérvas-Martínez a a Department of Computer Science and Numerical Analysis, University of Córdoba, Rabanales Campus, Albert Einstein building, Córdoba 14071, Spain b Department of Quantitative Methods, Universidad Loyola Andalucía, Escritor Castilla Aguayo 4, Córdoba 14004, Spain a r t i c l e i n f o Article history: Received 19 October 2016 Revised 9 January 2018 Accepted 17 February 2018 Available online 21 February 2018 Keywords: Time series segmentation Multiobjective optimisation Clustering Evolutionary computation a b s t r a c t Time series segmentation is aimed at representing a time series by using a set of seg- ments. Some researchers perform segmentation by approximating each segment with a simple model (e.g. a linear interpolation), while others focus their efforts on obtaining ho- mogeneous groups of segments, so that common patterns or behaviours can be detected. The main hypothesis of this paper is that both objectives are conflicting, so time series seg- mentation is proposed to be tackled from a multiobjective perspective, where both objec- tives are simultaneously considered, and the expert can choose the desired solution from a Pareto Front of different segmentations. A specific multiobjective evolutionary algorithm is designed for the purpose of deciding the cut points of the segments, integrating a cluster- ing algorithm for fitness evaluation. The experimental validation of the methodology in- cludes three synthetic time series and three time series from real-world problems. Nine clustering quality assessment metrics are experimentally compared to decide the most suitable one for the algorithm. The proposed algorithm shows good performance for both clustering quality and reconstruction error, improving the results of other mono-objective alternatives of the state-of-the-art and showing better results than a simple weighted lin- ear combination of both corresponding fitness functions. © 2018 Elsevier Inc. All rights reserved. 1. Introduction Time series are an important class of temporal data objects collected chronologically. The corresponding databases are of- ten large, high in dimensionality and require continuous updating. Thus, their intrinsic characteristics make them difficult to analyse. In this context, dimensionality reduction, similarity measurement, segmentation, visualisation and mining methods (such as hidden pattern discovery, clustering, classification or rule discovery) are part of time series research [16,25,35]. The segmentation task aims at creating an accurate approximation of the time series, by reducing its dimensionality while retaining the essential features. The objective of this task is to minimise the reconstruction error of a reduced rep- resentation with respect to the original time series. Segmentation tasks do not only reduce storage space but also increase the performance of data mining techniques. According to the literature review, current time series compression techniques require expert understanding of the time series, and appropriate threshold values need to be adjusted in order to reduce * Corresponding author. E-mail address: i92duroa@uco.es (A.M. Durán-Rosal). https://doi.org/10.1016/j.ins.2018.02.041 0020-0255/© 2018 Elsevier Inc. All rights reserved.