Clustering Moving Objects Based On A Moving Clustering Feature Tree Chih Lai Edward A. Heuer Graduate Programs in Software Engineering University of St. Thomas St. Paul, MN 55125 clai@stthomas.edu eaheuer@stthomas.edu ABSTRACT Clustering moving objects is a challenging task, especially when space consumption must be flexibly and efficiently adjusted for adapting to dynamic object movements. In this paper we develop an efficient approach for managing moving objects and predicting the essential time when moving clusters may need to be updated. Under our approach, moving objects are first inserted into a moving clustering feature (MCF) tree such that similar moving objects are grouped into moving micro clusters (MMCs). Each MMC is represented by a vector that summarizes the position and velocity information of its member objects. Based on this summarized information, a set of simple formulas is developed to efficiently predict when the contents of MMCs must be changed. High quality final clusters can then be obtained by executing a global clustering algorithm against MMCs. In addition, our approach can efficiently condense MMCs or the MCF tree to conserve space. We will also show that our approach can easily accommodate velocity changes by objects. Finally, we study the performance and quality of our approach. Categories and Subject Descriptors: H.2.8 [Database Management]: Database Application – Data Mining. General Terms: Algorithms. Keywords: Moving micro clusters (MMCs), moving cluster feature tree (MCF tree), open/close events. 1. Introduction Most existing clustering algorithms [1][2][3][5][6][12] are designed to discover snapshot clusters that reflect only the static status of a database. However, many real-world objects are moving objects: they keep changing their status over time. For example, cellular phone users may drive from one place to another, animals keep migrating in different seasons, and children continue to grow in height and weight. If we can compute the velocities of individual objects from past data and consider these velocities in the clustering process, clusters over time can be generated. One naïve way to predict future clusters is to repeatedly execute the snapshot clustering algorithms on the entire database at regular intervals. Unfortunately, choosing a right interval length for this time-driven approach is difficult. If the selected intervals are not short enough, many cluster changes may not be detected. Although shortening intervals can alleviate this problem, it can also waste system resources in many intervals where clusters do not significantly change. A more adequate approach is to dynamically maintain a small set of moving micro clusters (MMCs) [9] to represent groups of similar moving objects, and predict when the contents of MMCs will change. Not only can this predicted timing information indicate when to update MMCs, it can also assist users in making more intelligent decisions on when to execute global clustering on representative MMCs. The content of an MMC must be changed when two types of events occur. The first type of event is an open event where a moving object moves away from its containing MMC and joins another one. The second type of event is a close event where multiple MMCs move near each other over a time period. To predict when these events will happen, [9] proposes multiple kinetic heaps are used in each MMC to continuously track the object orders, consuming the space that is several times the database size. Moreover, MMCs in [9] will stay in the system forever once they are created and they cannot be efficiently merged due to their kinetic heaps and queues. As the result, the number of MMCs may keep growing as objects leave the dense areas, leading to low-density MMCs and prolonged global clustering time. Finally, because MMCs are rectangular, boundary objects of MMCs will be removed upon open events, not the objects that are furthest away from the centers of MMCs and thus contributing the most error. To address these problems, we notice that the methods discussed in [7] and [8] can be easily integrated with the BIRCH Clustering Feature (CF) tree so we can quickly identify the time instances of open/close events and efficiently condense MMCs for adapting to object movements. Under our approach, the initial positions and velocities of moving objects will first be inserted into a hierarchical Moving Clustering Feature (MCF) tree. Moving objects that are similar to each other will then be grouped into MMCs on the leaf nodes of an MCF tree at the initial time based on the predefined similarity threshold. Each MMC is summarized and represented by an MCF vector such that its average velocity, future centers, and future radius (error) can be easily computed. The MCF vector of an MMC will be updated only when the content of the MMC is affected by an open or close event, or when its containing objects change their velocities. An open event will be scheduled at time t to split an object from its containing MMC if the radius of the MMC is predicted to