Robust Time-Referenced Segmentation of Moving Object Trajectories Hyunjin Yoon and Cyrus Shahabi University of Southern California Los Angeles, CA 90089-0781 {hjy, shahabi}@usc.edu Abstract Trajectory segmentation is the process of partitioning a given trajectory into a small number of homogeneous seg- ments w.r.t. some criteria. Conventional segmentation tech- niques only focus on the spatial features of the movement and could lead to spatially homogeneous segments but with presumably dissimilar temporal structures. Furthermore, trajectories could be over-segmented in the presence of out- liers. In this paper, we propose a family of three trajec- tory segmentation methods that takes into account both geo- spatial and temporal structures of movement for the seg- mentation and is also robust with respect to time-referenced spatial outliers. The effectiveness of our methods is empiri- cally demonstrated over three real-world datasets. 1. Introduction A trajectory of a moving object is a series of locations sampled at discrete instances of time and defined as a se- quence of pairs, (p 1 ,t 1 ), (p 2 ,t 2 ),...,(p n ,t n ), where p i is a two- or three-dimensional vector representing the geo- spatial position observed at a timestamp t i (i =1,...,n). Various types of trajectory data tracking the movement of vehicles, animals, or human subjects have been acquired using location-aware sensors and exploited to find simi- lar trajectories [4], discover frequent spatio-temporal pat- terns [3, 6, 8], and eventually obtain insights into the behav- ioral traits of moving objects. Often the size of a trajectory, i.e., the number of obser- vations n, is large. For example, the elk trajectories used in [8] contain about 1430 observations on average, and the size of bus trajectories used in [3] varies in the range from 1000 to 7000. It is therefore necessary to preprocess the tra- jectories to reduce the dimensionality and compress them in a compact and concise representation in order to process them efficiently in the subsequent data analysis tasks. This research has been funded in part by NSF grants IIS-0238560 (PECASE), IIS-0534761, IIS-0742811 and CNS-0831505 (CyberTrust), and in part from the METRANS Transportation Center, under grants from USDOT and Caltrans. Any opinions, findings, and conclusions or recom- mendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Trajectory segmentation is an attempt to partition a given trajectory into a small number of homogeneous segments, such that the data within each segment are similar w.r.t. some criteria and thus can be effectively described by a sim- ple model [2]. A typical approach previously adopted for the trajectory segmentation [3,8] takes a simple sequence of sampled locations (by dropping the timestamp component) of a trajectory as an input, which we call a route of a moving object to explicitly distinguish it from a trajectory. The ap- proach first selects a subset of the sampled locations, iden- tified as characteristic points (CPs), where the geometric structure (e.g., spatial closeness, co-linearity, or movement direction) of the given route changes substantially. Subse- quently, only the selected CPs are retained to approximate the input trajectory as a sequence of lines, each connect- ing two consecutive CPs. Figure 1 illustrates a route with 9 sampled positions and its desirable segmentation into four continuous and non-overlapping segments. S1 S2 p1 p3 p9 p6 p7 p8 p4 X Y p5 S3 S4 p2 : sampled positions : characteristic points Figure 1. An example of route segmentation Our key observation is that such segmentations discard- ing the time component could lead to spatially homoge- neous segments but with presumably dissimilar temporal or spatio-temporal structures unless a constant sampling rate is assumed 1 . Suppose the first four locations in Figure 1 are acquired at irregular sampling rate, e.g., time-stamped at 1, 2, 3, and 13, respectively. From the timestamps to- gether with the moving distances, it can be derived that the speed development of the moving object varies within the obtained segment S 1 ; it is fast at first from p 1 to p 2 , sim- ilarly fast from p 2 to p 3 , and then moves slowly from p 3 to p 4 . Since the movement speed significantly changes at p 3 , the segment S 1 should have been partitioned at p 3 to re- sult in genuinely homogeneous segments in terms of both 1 Irregular sampling rates are usually encountered in the real-world sen- sor data due to the inherent imprecisions of sensor devices, missing data, network failure or delay, disturbance signals, etc. 2008 Eighth IEEE International Conference on Data Mining 1550-4786/08 $25.00 © 2008 IEEE DOI 10.1109/ICDM.2008.133 1121 2008 Eighth IEEE International Conference on Data Mining 1550-4786/08 $25.00 © 2008 IEEE DOI 10.1109/ICDM.2008.133 1121