Applied Soft Computing 23 (2014) 319–332
Contents lists available at ScienceDirect
Applied Soft Computing
j ourna l h o mepage: www.elsevier.com/locate/asoc
Hybrid meta-heuristic optimization algorithms for
time-domain-constrained data clustering
M
a
Luz López García
a
, Ricardo García-Ródenas
a,∗
, Antonia González Gómez
b
a
Departamento de Matemática Aplicada, Escuela Superior de Informática, Universidad de Castilla la Mancha, 28012 Ciudad Real, Spain
b
Departamento de Matemática Aplicada a los Recursos Naturales, E.T. Superior de Ingenieros de Montes, Universidad Politécnica de Madrid, 28040 Madrid,
Spain
a r t i c l e i n f o
Article history:
Received 5 January 2013
Received in revised form 4 June 2014
Accepted 25 June 2014
Available online 3 July 2014
Keywords:
Time series clustering
Segmentation of multivariate time series
Nelder–Mead simplex search method
Particle swarm optimization
Genetic algorithms
Simulated annealing
a b s t r a c t
This paper addresses the question of time-domain-constrained data clustering, a problem which deals
with data labelled with the time they are obtained and imposing the condition that clusters need to be
contiguous in time (the time-domain constraint). The objective is to obtain a partitioning of a multivariate
time series into internally homogeneous segments with respect to a statistical model given in each cluster.
In this paper, time-domain-constrained data clustering is formulated as an unrestricted bi-level opti-
mization problem. The clustering problem is stated at the upper level model and at the lower level the
statistical models are adjusted to the set of clusters determined in the upper level. This formulation is
sufficiently general to allow these statistical models to be used as black boxes. A hybrid technique based
on combining a generic population-based optimization algorithm and Nelder–Mead simplex search is
used to solve the bi-level model.
The capability of the proposed approach is illustrated using simulations of synthetic signals and a novel
application for survival analysis. This application shows that the proposed methodology is a useful tool
to detect changes in the hidden structure of historical data.
Finally, the performance of the hybridizations of particle swarm optimization, genetic algorithms and
simulated annealing with Nelder–Mead simplex search are tested on a pattern recognition problem of
text identification.
© 2014 Elsevier B.V. All rights reserved.
Introduction
The problem of time-series segmentation means partitioning a
time series in K-time segments that are internally homogeneous
[1]. Time-series segmentation has been applied in a wide range of
fields such as signal analysis [2,3], industrial process monitoring
[4–6], time series DNA micro-array analysis [7], loading identifi-
cation for stable operation of thermal power units [8], automatic
segmentation of traffic patterns [9,10], human motion analysis [11],
geophysics environmental research [12], among others. The desired
goals depend on the specific application and aim to locate stable
periods of time, to identify changing points, or simply to express
the original time series in a compact way.
∗
Corresponding author. Tel.: +34 926295300.
E-mail addresses: Marialuz.lopez@uclm.es (M.L. López García),
Ricardo.Garcia@uclm.es (R. García-Ródenas), antonia.gonzalez@upm.es
(A.G. Gómez).
One of the most widely used methods for dealing with
time-series segmentation problems is cluster analysis. Cluster-
ing methods refer to the process of dividing a set of objects into
groups so that members of the same group are similar to each
other and different from members of the rest of the groups. The
clustering analysis problem can be formulated as one of optimi-
zing a loss (or merit) function subject to a set of constraints. This
approach is very versatile and has allowed additional information
to be added to the clustering process by adding new constraints
or modifying the objective function. This formulation allows the
inclusion of information about the shape of the clusters, the distri-
bution of data and the presence of noise and outliers. In this paper
we consider constrained time-dependent clustering analysis. This
approach enriches the cluster analysis by adding the time at which
the data is obtained to the set of constraints and taking into account
the fact that neighbouring observations in time may belong to the
same cluster.
K-means and Fuzzy c-Means (FCM) clustering methods have
been adapted to solve multiple clustering analysis problem vari-
ants [13,14]. These methods do not incorporate a time structure to
http://dx.doi.org/10.1016/j.asoc.2014.06.046
1568-4946/© 2014 Elsevier B.V. All rights reserved.