Applied Soft Computing 23 (2014) 319–332 Contents lists available at ScienceDirect Applied Soft Computing j ourna l h o mepage: www.elsevier.com/locate/asoc Hybrid meta-heuristic optimization algorithms for time-domain-constrained data clustering M a Luz López García a , Ricardo García-Ródenas a,∗ , Antonia González Gómez b a Departamento de Matemática Aplicada, Escuela Superior de Informática, Universidad de Castilla la Mancha, 28012 Ciudad Real, Spain b Departamento de Matemática Aplicada a los Recursos Naturales, E.T. Superior de Ingenieros de Montes, Universidad Politécnica de Madrid, 28040 Madrid, Spain a r t i c l e i n f o Article history: Received 5 January 2013 Received in revised form 4 June 2014 Accepted 25 June 2014 Available online 3 July 2014 Keywords: Time series clustering Segmentation of multivariate time series Nelder–Mead simplex search method Particle swarm optimization Genetic algorithms Simulated annealing a b s t r a c t This paper addresses the question of time-domain-constrained data clustering, a problem which deals with data labelled with the time they are obtained and imposing the condition that clusters need to be contiguous in time (the time-domain constraint). The objective is to obtain a partitioning of a multivariate time series into internally homogeneous segments with respect to a statistical model given in each cluster. In this paper, time-domain-constrained data clustering is formulated as an unrestricted bi-level opti- mization problem. The clustering problem is stated at the upper level model and at the lower level the statistical models are adjusted to the set of clusters determined in the upper level. This formulation is sufﬁciently general to allow these statistical models to be used as black boxes. A hybrid technique based on combining a generic population-based optimization algorithm and Nelder–Mead simplex search is used to solve the bi-level model. The capability of the proposed approach is illustrated using simulations of synthetic signals and a novel application for survival analysis. This application shows that the proposed methodology is a useful tool to detect changes in the hidden structure of historical data. Finally, the performance of the hybridizations of particle swarm optimization, genetic algorithms and simulated annealing with Nelder–Mead simplex search are tested on a pattern recognition problem of text identiﬁcation. © 2014 Elsevier B.V. All rights reserved. Introduction The problem of time-series segmentation means partitioning a time series in K-time segments that are internally homogeneous [1]. Time-series segmentation has been applied in a wide range of ﬁelds such as signal analysis [2,3], industrial process monitoring [4–6], time series DNA micro-array analysis [7], loading identiﬁ- cation for stable operation of thermal power units [8], automatic segmentation of trafﬁc patterns [9,10], human motion analysis [11], geophysics environmental research [12], among others. The desired goals depend on the speciﬁc application and aim to locate stable periods of time, to identify changing points, or simply to express the original time series in a compact way. ∗ Corresponding author. Tel.: +34 926295300. E-mail addresses: Marialuz.lopez@uclm.es (M.L. López García), Ricardo.Garcia@uclm.es (R. García-Ródenas), antonia.gonzalez@upm.es (A.G. Gómez). One of the most widely used methods for dealing with time-series segmentation problems is cluster analysis. Cluster- ing methods refer to the process of dividing a set of objects into groups so that members of the same group are similar to each other and different from members of the rest of the groups. The clustering analysis problem can be formulated as one of optimi- zing a loss (or merit) function subject to a set of constraints. This approach is very versatile and has allowed additional information to be added to the clustering process by adding new constraints or modifying the objective function. This formulation allows the inclusion of information about the shape of the clusters, the distri- bution of data and the presence of noise and outliers. In this paper we consider constrained time-dependent clustering analysis. This approach enriches the cluster analysis by adding the time at which the data is obtained to the set of constraints and taking into account the fact that neighbouring observations in time may belong to the same cluster. K-means and Fuzzy c-Means (FCM) clustering methods have been adapted to solve multiple clustering analysis problem vari- ants [13,14]. These methods do not incorporate a time structure to http://dx.doi.org/10.1016/j.asoc.2014.06.046 1568-4946/© 2014 Elsevier B.V. All rights reserved.