Efficiently Querying Moving Objects with Pre-defined Paths in a Distributed Environment Cyrus Shahabi, Mohammad R. Kolahdouzan, Snehal Thakkar, Jose Luis Ambite and Craig A. Knoblock Department of Computer Science and Information Sciences Institute University of Southern California Los Angeles, California 90089 [shahabi, kolahdoz, snehalth]@usc.edu [ambite, knoblock]@isi.edu ABSTRACT Due to the recent growth of the World Wide Web, numer- ous spatio-temporal applications can obtain their required information from publicly available web sources. We con- sider those sources maintaining moving objects with prede- fined paths and schedules, and investigate different plans to perform queries on the integration of these data sources effi- ciently. Examples of such data sources are networks of rail- road paths and schedules for trains running between cities connected through these networks. A typical query on such data sources is to find all trains that pass through a given point on the network within a given time interval. We show that traditional filter+semi-join plans would not result in ef- ficient query response times on distributed spatio-temporal sources. Hence, we propose a novel spatio-temporal filter, called deviation filter, that exploits both the spatial and tem- poral characteristics of the sources in order to improve the selectivity. We also report on our experiments in comparing the performances of the alternative query plans and con- clude that the plan with spatio-temporal filter is the most viable and superior plan. 1. INTRODUCTION The explosive growth of the Internet has made a wealth of networked information available. Much of this informa- tion is geographical, spatial, temporal, or pertains to ob- jects that have a spatial or temporal nature. The sources of this information are heterogeneous: traditional databases This research has been funded in part by NSF grants EEC- 9529152 (IMSC ERC) and ITR-0082826, NASA/JPL con- tract nr. 961518, DARPA and USAF under agreement nr. F30602-99-1-0524, and unrestricted cash/equipment gifts from NCR, IBM, Intel and SUN. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 2001 ACM XXXXXXXXX/XX/XX ...$5.00. with spatial extensions, geographical information systems (GIS) software packages, mapping and imagery web sites, web sites with spatial information, etc. An increasing num- ber of web sites have information of a geospatial or temporal character. For example, detailed satellite images can be ob- tained from sites such as www.terraserver.com; maps from www.mapquest.com; train schedules from www.amtrak.com; geolocated points of interest such as train stations from www.usgs.gov ; geographical features such as railroad net- works from www.nima.mil ; etc. The number of sources, the quality, and detail of the information available are con- tinually growing all around the globe. In this paper, we focus our attention on how to efficiently query moving ob- jects such as trains in a distributed environment such as the one mentioned above. Recently there has been a growing interest in moving object databases that manage the spatial objects whose position changes over time [2, 3, 4, 5, 6, 7, 8]. Example applications are those who query the locations of trains, cars and planes for a given time interval. The main challenge investigated by these studies is how to model the large spatio-temporal data needed to track the position of any object at any given time (either in the past, future or now). In this paper, we consider an environment where the content of the moving object database does not need to be modi- fied to reflect the movement of the objects. We term this environment as “moving objects with predefined paths and schedules.An example application is to query the loca- tion of trains moving on a railroad network. By storing the schedules of trains’ departures and arrivals, the loca- tions of the stations and the vector data corresponding to the railroad network, we have enough information to query the location of any moving object (i.e., train) at any given time. Note that the database still needs to be modified (e.g., when schedule changes), but it does not need to be updated (and/or appended) as the objects move around the network within the provided schedule. The challenge, however, is that queries of the type of finding the location of a train in a given time interval are time consuming because of the expensive functions such as the shortest path function that need to be performed on large vector data as well as the