Efficient k Nearest Neighbor Queries on Remote Spatial Databases Using Range Estimation (Draft Version) Dan-Zhou Liu Ee-Peng Lim Wee-Keong Ng Center for Advanced Information Systems, Nanyang Technological University Nanyang Avenue, Singapore 639798, SINGAPORE {P149571472, aseplim, awkng}@ntu.edu.sg Abstract K-Nearest Neighbor (k-NN) queries are used in GIS and CAD/CAM applications to find the k spatial objects closest to some given query points. Most previous k-NN research has assumed that the spatial databases to be queried are local, and that the query processing algorithms have direct access to their spatial indices, e.g. R-trees. Clearly, this assumption does not hold when k-NN queries are directed at remote spatial databases that operate autonomously. While it is possible to replicate some or all the spatial objects from the remote databases in a local database and build a separate index structure for them, such an alternative is infeasible when the database is huge, or there are large number of spatial databases to be queried. In this paper, we therefore propose a k-NN query processing al- gorithm that uses one or more window query to retrieve the nearest neighbors of a given query point. We also propose three different methods to estimate the ranges to be used by the window queries. Each range estimation method requires different statistical knowledge about the spatial databases. Our experiments on the TIGER data have shown that our pro- posed algorithm coupled with different range estimation methods can handle k-NN queries efficiently. Apart from not requiring direct access to the spatial indices, the window queries used in our proposed algorithm can be easily supported by non-spatial database systems containing spatial objects. 1 Introduction 1.1 Motivation The Nearest Neighbor (NN) queries in spatial databases refer to finding the spatial objects nearest to some given query points. NN queries are used in a wide range of applications, such as Geographic Information Systems (GIS), Computer Aided Design (CAD), computational biology, decision support, and pattern recognition [29]. NN queries in spatial databases can be classified into five major categories: simple k-NN queries [2, 6, 8, 9, 17, 23, 24, 28], approximate k-NN queries [3, 10, 14], reverse NN queries [22, 30], constrained k-NN queries [13], and k-NN join queries [18]. In this paper, we focus on simple k-NN queries. Given a set of spatial objects denoted by S, and a distance function d, a simple k-NN query for a query point q is to find the 1