Search Algorithms for Multiway Spatial Joins DIMITRIS PAPADIAS Department of Computer Science Hong Kong University of Science and Technology Clear Water Bay, Hong Kong Email: dimitris@cs.ust.hk DINOS ARKOUMANIS Department of Electrical and Computer Engineering National Technical University of Athens Greece, 15773 Email: dinosar@dbnet.ece.ntua.gr Abstract. This papers deals with multiway spatial joins when (i) there is limited time for query processing and the goal is to retrieve the best possible solutions within this limit (ii) there is unlimited time and the goal is to retrieve a single exact solution, if such a solution exists, or the best approximate one otherwise. The first case is motivated by the high cost of join processing in real-time systems involving large amounts of multimedia data, while the second one is motivated by applications that require “negative” examples. We propose several search algorithms for query processing under theses conditions. For the limited-time case we develop some non-deterministic search heuristics that can quickly retrieve good solutions. However, these heuristics are not guaranteed to find the best solutions, even without a time limit. Therefore, for the unlimited-time case we describe systematic search algorithms tailored specifically for the efficient retrieval of a single solution. Both types of algorithms are integrated with R-trees in order to prune the search space. Our proposal is evaluated with extensive experimental comparison. 1. Introduction A multiway spatial join can be expressed as follows: Given n datasets D 1 , D 2 , ... D n and a query Q, where Q ij is the spatial predicate that should hold between D i and D j , retrieve all n-tuples {(r 1,w ,..,r i,x ,..,r j,y ,..,r n,z ) | 2200 i,j : r i,x ∈ D i , r j,y ∈ D j and r i,x Q ij r j,y }. Such a query can be represented by a graph where nodes correspond to datasets and edges to join predicates. Equivalently, the graph can be viewed as a constraint network (Dechter and Meiri 1994) where the nodes are problem variables, and edges are binary spatial constraints. In the sequel we use the terms variable/dataset and constraint/join condition interchangeably. Following the standard terminology in the spatial database literature we assume that the standard join condition is overlap (intersect, non-disjoint). In this case the graph is undirected (Q ij =Q ji ) and, if Q ij =True, then the rectangles from the corresponding inputs i,j should overlap. Figures 1a and 1b illustrate two example queries: the first one has an acyclic (tree) graph, and the second one has a complete (clique) graph. 1 4 2 3 1,1 r 2,1 r 3,1 r 4,1 r 1 4 2 3 1,1 r 4,1 r 2,1 r 3,1 r 1,1 r 4,1 r 2,1 r 3,1 r 1,1 r 4,1 r 2,1 r 3,1 r (a) chain (tree) query (b) clique query (c) approximate solutions for clique Figure 1 Example queries and solutions We use the notation v i ←r i,x to express that variable v i is instantiated to rectangle r i,x (which belongs to domain D i ). A binary instantiation {v i ←r i,x , v j ←r j,y } is inconsistent if there is a join condition Q ij , but r i,x and r j,y do not overlap. A solution is a set of n instantiations {v 1 ← r 1,w , .., v i ←r i,x , .., v j ←r j,y , ..,v n ←r n,z } which, for simplicity, can This is the Pre-Published Version