Multi–feature trajectory clustering using Earth Mover’s Distance Francesca Boem, Felice Andrea Pellegrino, Gianfranco Fenu and Thomas Parisini Abstract— We present new results in trajectory clustering, obtained by extending a recent methodology based on Earth Mover’s Distance (EMD). The EMD can be adapted as a tool for trajectory clustering, taking advantage of an effective method for identifying the clusters’ representatives by means of the p-median location problem. This methodology can be used either in an unsupervised fashion, or on-line, classifying new trajectories or part of them; it is able to manage different length and noisy trajectories, occlusions and takes velocity profiles and stops into account. We extend our previous work by taking into account other features besides the spatial locations, in particular we consider the direction of movement in correspondence of each trajectory point. We discuss the simulation results and we compare our approach with another trajectory clustering method. I. I NTRODUCTION Behaviour recognition and motion prediction are important tasks in a lot of applications, for example video surveillance and robot navigation. Trajectory is one of the most meaning- ful features in behavioral analysis: when people are moving in space, they usually do not move randomly; instead, they often engage in typical motion patterns. Moreover, a lot of data can be easily collected by tracking and recording the trajectories of many individuals. The availability of this data raises the need for trajectory clustering methodologies, aimed at clustering collected trajectories according to an appropriate similarity criterion. An important goal is that of classifying and/or making predictions on the subsequent portions of trajectory while a new trajectory is being observed (i.e. on- line). Given the large amount of data and the time con- straints arising when performing on-line analysis, a method for recovering a representative of each cluster is desirable, allowing fast and simple comparisons for performing clas- sification. Moreover, some well-known tools for clustering, such as kmeans clustering, rely on the availability of a cluster representative (centroid). Clustering and prediction of sets of curves is employed in many areas of science and engineering. A survey about time series clustering can be found in [1]. In [2], [3] and [4] the Expectation-Maximization (EM) algorithm is used to cluster motion trajectories into various classes of motion patterns. In [5], a cluster-based technique is proposed that learns the typical motion patterns using pairwise clustering. Classical This work has been partially supported by the EU Artemis JU project “CESAR” (contract number 100016; website: http://www.cesarproject.eu). Francesca Boem, Felice Andrea Pellegrino and Gianfranco Fenu are with the Department of Industrial and Information Engineering, DI3, University of Trieste, Italy (francesca.boem@phd.units.it, fapellegrino@units.it, fenu@units.it) Thomas Parisini is with Imperial College London, UK and University of Trieste, Italy. (t.parisini@imperial.ac.uk) kmeans algorithms work better with time series of equal length because the concept of cluster centers becomes often unclear when the same cluster contains time series of differ- ent length. They are applicable to series of different length as well as an appropriate distance measure is used to compute the distance/similarity. It is a considerable advantage to have a single representative for each cluster to be stored and a number of techniques have been proposed to this aim. Sometimes, a sample trajectory from each cluster is selected in some way, for example randomly, and then updated. In [6] and in [7], a density-based approach is proposed in order to identify clusters’ centroids and to cluster trajectories. Another approach relies on selecting the trajectory or the segment that has the longest common subsequence (for example in [8]). Finally, it is common to choose the existing element that maximizes a similarity index in the cluster, like in [9]. In this paper, we exploit the flexibility of the recently introduced clustering methodology [10] in order to discri- minate different behaviour features, besides spatial position features. The main idea is that of expressing each trajectory as a multi-dimensional histogram; the distance between two given histograms can be computed by means of EMD while the clusters’ representatives are found by solving the pmedian location problem [11]. We show the effectiveness of the method when dealing with multi–feature trajectories; in particular, we consider a feature related to the direction of movement in correspondence to each trajectory point. The paper is organized as follows. In Section II we recall the Earth Mover’s Distance and its adaptation to trajectory clustering. Then, in Section III, we show how the metho- dology can deal with multi–feature trajectories. Finally, in Section IV, we provide simulation results. II. EARTH MOVERS DISTANCE AND P- MEDIAN PROBLEM FOR TRAJECTORY CLUSTERING The trajectory of the target whose motion we want to predict or to classify consists of a sequence of the coordinates (x(k),y(k)) and the time t(k) for each observation sample k =1, 2, ... (see Fig. 1). Possibly, some further features are part of the trajectory: for example, when the target is a person, features carrying information about her/his posture could be added. We imagine that the target is scattering back of himself a constant quantity of earth while walking. Therefore target’s trajectory can be seen as a distribution of a mass of earth properly spread in space. In order to discretize the information content of each trajectory, we construct a grid over the space where the target is moving: thus the amount of earth in each cell of the grid corresponds to the 2011 IEEE Conference on Automation Science and Engineering Trieste, Italy - August 24-27, 2011 ThC1.2 978-1-4577-1732-1/11/$26.00 ©2011 IEEE 310