Distributed Processing of Range Queries with Non-Spatial Selections DongEun Kim, HaRim Jung, GiWoong Nam, Hee Yong Youn and Ung-Mo Kim School of Information and Communication Engineering, Sungkyunkwan University 27309, 2066, Seobu-Ro, Jangan-Gu, Suwon, Gyeong gi-do, Korea dongsilver1@gmail.com, harim3826@gmail.com, nku0691@skku.edu, youn7147@skku.edu, ukim@skku.edu ABSTRACT In this paper, we focus on the problem of processing spatial range queries with non-spatial selections. In order to process range queries with non-spatial selections, we first introduce a baseline search algorithm. Then, we propose a novel search algorithm on the Hilbert R-tree to reduce the number of data accesses. Both the baseline algorithm and the proposed algorithm utilize MapReduce because traditional single machine-based query processing methods might suffer from drastic performance degradation when the size of the dataset becomes extremely large. Through simulations, we compare the performances of the baseline algorithm and the proposed algorithm, and verify the efficiency of our proposed algorithm. KEYWORDS Range queries, MapReduce, Hilbert R-tree, Location based services, Geographic information systems 1 INTRODUCTION During the past decade, spatial databases has received increasing interest due to its important role in many modern applications, such as geographic information systems (GIS), multimedia databases, navigation systems, urban planning, and traveler information systems. In many real-life Location Based Services (LBSs), the service provider employs the range queries for only focusing specific area. However, someone needs more specific data that includes spatial information as well as non-spatial information. So, we will find some objects with non-spatial attribute in specific range. For example, Figure 1 shows that suppose Social Networking Service (SNS) user wants to find on specific class of people (e.g., 20Age 30 and gender = female) in queryRange. In this case, the service provider should report the results (e.g., q2 and q3) to the user. We will use the MapReduce [1] for distributed computing in whole process. MapReduce [1] is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. q 1 ( 25, male) q 2 ( 25, female) q 6 ( 35, male) q 3 ( 21, female) q 7 ( 23, male) q 12 ( 21, male) q 11 ( 27, male) q 4 ( 28, female) q 5 ( 21, female) q 8 ( 18, male) q 9 ( 21, male) q 10 ( 28, male) queryRange Figure 1 An example of searching the non-spatial information in queryRange To solve the problem, we propose a range search algorithm using Hilbert R-tree [2]. Hilbert R-tree is a variant of R-tree that utilizes Hilbert curve, a space curving shown to preserve spatial locality most effectively, to guide data insertion. The tree nodes are sorted by the Hilbert value. Each query ISBN: 978-0-9891305-4-7 ©2014 SDIWC 159