Reverse Spatial and Textual k Nearest Neighbor Search Jiaheng Lu Ying Lu DEKE, MOE and School of Information Renmin University of China {jiahenglu, yinglu}@ruc.edu.cn Gao Cong School of Computer Engineering Nanyang Technological University, Singapore gaocong@ntu.edu.sg ABSTRACT Geographic objects associated with descriptive texts are be- coming prevalent. This gives prominence to spatial keyword queries that take into account both the locations and textual descriptions of content. Specifically, the relevance of an ob- ject to a query is measured by spatial-textual similarity that is based on both spatial proximity and textual similarity. In this paper, we define Reverse Spatial Textual k Nearest Neighbor (RSTk NN) query, i.e., finding objects that take the query object as one of their k most spatial-textual sim- ilar objects. Existing works on reverse kNN queries focus solely on spatial locations but ignore text relevance. To answer RSTk NN queries efficiently, we propose a hy- brid index tree called IUR-tree (Intersection-Union R-Tree) that effectively combines location proximity with textual similarity. Based on the IUR-tree, we design a branch-and- bound search algorithm. To further accelerate the query processing, we propose an enhanced variant of the IUR-tree called clustered IUR-tree and two corresponding optimiza- tion algorithms. Empirical studies show that the proposed algorithms offer scalability and are capable of excellent per- formance. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications— Spatial databases and GIS General Terms Algorithms, Design, Experimentation, Performance Keywords Reverse k nearest neighbor, Spatial-keyword query 1. INTRODUCTION Reverse k Nearest Neighbor (Rk NN) [10] query, which is to find objects whose k nearest neighbors (kNN) include the query point, has received considerable attention. Among Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGMOD’11, June 12–16, 2011, Athens, Greece. Copyright 2011 ACM 978-1-4503-0661-4/11/06 ...$10.00. many applications, it is used to discover the influence sets, i.e., objects in a dataset highly influenced by the query ob- ject [10]. In the literature [2, 3, 7, 10, 19, 20, 22, 25], spatial distance is usually considered as the sole influence factor. However, in real applications, distance alone is not suffi- cient to characterize the influence between two objects. For example, two objects, e.g., restaurant, are more likely to in- fluence each other if their textual descriptions (e.g., seafood buffet lunch including crab and shrimp) are similar. In contrast, we take into account textual similarity in Rk NN, and study a new kind of Rk NN problem that is called Reverse Spatial and Textual k Nearest Neigh- bor (RSTkNN) queries that consider both spatial distance and textual similarity. An RSTk NN query is to find the objects that have the query object as one of their k most spatial-textual similar objects. This new type of query is dif- ferent from Rk NN (e.g., [10]), and spatial-keyword queries (e.g.,Lk T [6]). (See Section 8 for a detailed comparison). Figure 1 gives an example to illustrate the RSTkNN query we proposed and the conventional Rk NN query. Points p1 ··· p9 in Fig.1(a) are existing branch stores in a region, and q is a new store which will open. (N1··· N7 in Fig.1(a) are MBRs to be explained in Section 4). Products of each branch store are given in Fig.1(b), where the weight of each item can be calculated by TF-IDF [18]. Then an RSTk NN query with q as query object finds the existing stores that will be influenced most by q considering both the locations of the stores and the stuffs that the stores sell. Assume k=2, the results of traditional Rk NN query are {p4, p5, p9}, while the results of our RSTkNN query will be {p1, p4, p5, p9}. Note p1 is one of answers since the textual description of p1 is quite similar with that of q, though q is not a 2NN of p1 in terms of spatial distance alone. N1 N3 N5 N2 N6 N4 N7 y x p 3 p 4 p 2 p 5 p 6 p 7 p 8 p 9 q(12,6) p 1 (a) Distribution of branch stores x ObjVct1 8 8 0 0 0 ObjVct2 1 1 8 8 4 ObjVct3 1 1 4 4 1 ObjVct4 7 7 1 1 0 ObjVct5 4 4 1 1 0 ObjVct6 1 1 7 7 0 ObjVct7 0 0 0 0 8 ObjVct8 1 1 0 0 7 ObjVct9 0 0 1 1 4 0 4 1 0 0 0 8 7 4 12 16 15 0 5 11 20 22 10 3 4 14 11 6 0 18 25 19 p 1 p 2 p 3 p 4 p 5 p 6 p 7 p 8 p 9 q 12 6 ObjVctQ 8 8 0 0 0 0 vectors y laptop camera diaper pan sportswear s t ation ery (b) Locations and products of branch stores in (a) Figure 1: An example of RSTkNN queries RSTk NN queries have many applications ranging from map-based Web search to GIS decision support. For ex- ample, a shopping mall can use an RSTk NN query to find