Multi–feature trajectory clustering using Earth Mover’s Distance
Francesca Boem, Felice Andrea Pellegrino, Gianfranco Fenu and Thomas Parisini
Abstract— We present new results in trajectory clustering,
obtained by extending a recent methodology based on Earth
Mover’s Distance (EMD). The EMD can be adapted as a tool for
trajectory clustering, taking advantage of an effective method
for identifying the clusters’ representatives by means of the
p-median location problem. This methodology can be used
either in an unsupervised fashion, or on-line, classifying new
trajectories or part of them; it is able to manage different length
and noisy trajectories, occlusions and takes velocity profiles and
stops into account. We extend our previous work by taking into
account other features besides the spatial locations, in particular
we consider the direction of movement in correspondence of
each trajectory point. We discuss the simulation results and
we compare our approach with another trajectory clustering
method.
I. I NTRODUCTION
Behaviour recognition and motion prediction are important
tasks in a lot of applications, for example video surveillance
and robot navigation. Trajectory is one of the most meaning-
ful features in behavioral analysis: when people are moving
in space, they usually do not move randomly; instead, they
often engage in typical motion patterns. Moreover, a lot of
data can be easily collected by tracking and recording the
trajectories of many individuals. The availability of this data
raises the need for trajectory clustering methodologies, aimed
at clustering collected trajectories according to an appropriate
similarity criterion. An important goal is that of classifying
and/or making predictions on the subsequent portions of
trajectory while a new trajectory is being observed (i.e. on-
line). Given the large amount of data and the time con-
straints arising when performing on-line analysis, a method
for recovering a representative of each cluster is desirable,
allowing fast and simple comparisons for performing clas-
sification. Moreover, some well-known tools for clustering,
such as k−means clustering, rely on the availability of a
cluster representative (centroid).
Clustering and prediction of sets of curves is employed in
many areas of science and engineering. A survey about time
series clustering can be found in [1]. In [2], [3] and [4] the
Expectation-Maximization (EM) algorithm is used to cluster
motion trajectories into various classes of motion patterns.
In [5], a cluster-based technique is proposed that learns the
typical motion patterns using pairwise clustering. Classical
This work has been partially supported by the EU Artemis JU project
“CESAR” (contract number 100016; website: http://www.cesarproject.eu).
Francesca Boem, Felice Andrea Pellegrino and Gianfranco Fenu are
with the Department of Industrial and Information Engineering, DI3,
University of Trieste, Italy (francesca.boem@phd.units.it,
fapellegrino@units.it, fenu@units.it)
Thomas Parisini is with Imperial College London, UK and University of
Trieste, Italy. (t.parisini@imperial.ac.uk)
k−means algorithms work better with time series of equal
length because the concept of cluster centers becomes often
unclear when the same cluster contains time series of differ-
ent length. They are applicable to series of different length as
well as an appropriate distance measure is used to compute
the distance/similarity. It is a considerable advantage to have
a single representative for each cluster to be stored and
a number of techniques have been proposed to this aim.
Sometimes, a sample trajectory from each cluster is selected
in some way, for example randomly, and then updated. In
[6] and in [7], a density-based approach is proposed in
order to identify clusters’ centroids and to cluster trajectories.
Another approach relies on selecting the trajectory or the
segment that has the longest common subsequence (for
example in [8]). Finally, it is common to choose the existing
element that maximizes a similarity index in the cluster, like
in [9].
In this paper, we exploit the flexibility of the recently
introduced clustering methodology [10] in order to discri-
minate different behaviour features, besides spatial position
features. The main idea is that of expressing each trajectory
as a multi-dimensional histogram; the distance between two
given histograms can be computed by means of EMD
while the clusters’ representatives are found by solving the
p−median location problem [11]. We show the effectiveness
of the method when dealing with multi–feature trajectories;
in particular, we consider a feature related to the direction
of movement in correspondence to each trajectory point.
The paper is organized as follows. In Section II we recall
the Earth Mover’s Distance and its adaptation to trajectory
clustering. Then, in Section III, we show how the metho-
dology can deal with multi–feature trajectories. Finally, in
Section IV, we provide simulation results.
II. EARTH MOVER’ S DISTANCE AND P- MEDIAN
PROBLEM FOR TRAJECTORY CLUSTERING
The trajectory of the target whose motion we want to predict
or to classify consists of a sequence of the coordinates
(x(k),y(k)) and the time t(k) for each observation sample
k =1, 2, ... (see Fig. 1). Possibly, some further features
are part of the trajectory: for example, when the target is
a person, features carrying information about her/his posture
could be added. We imagine that the target is scattering
back of himself a constant quantity of earth while walking.
Therefore target’s trajectory can be seen as a distribution of a
mass of earth properly spread in space. In order to discretize
the information content of each trajectory, we construct a
grid over the space where the target is moving: thus the
amount of earth in each cell of the grid corresponds to the
2011 IEEE Conference on Automation Science and Engineering
Trieste, Italy - August 24-27, 2011
ThC1.2
978-1-4577-1732-1/11/$26.00 ©2011 IEEE
310