Distributed Processing of Range Queries with Non-Spatial Selections
DongEun Kim, HaRim Jung, GiWoong Nam, Hee Yong Youn and Ung-Mo Kim
School of Information and Communication Engineering, Sungkyunkwan University
27309, 2066, Seobu-Ro, Jangan-Gu, Suwon, Gyeong gi-do, Korea
dongsilver1@gmail.com, harim3826@gmail.com,
nku0691@skku.edu, youn7147@skku.edu, ukim@skku.edu
ABSTRACT
In this paper, we focus on the problem of processing
spatial range queries with non-spatial selections. In
order to process range queries with non-spatial
selections, we first introduce a baseline search
algorithm. Then, we propose a novel search algorithm
on the Hilbert R-tree to reduce the number of data
accesses. Both the baseline algorithm and the proposed
algorithm utilize MapReduce because traditional single
machine-based query processing methods might suffer
from drastic performance degradation when the size of
the dataset becomes extremely large. Through
simulations, we compare the performances of the
baseline algorithm and the proposed algorithm, and
verify the efficiency of our proposed algorithm.
KEYWORDS
Range queries, MapReduce, Hilbert R-tree, Location
based services, Geographic information systems
1 INTRODUCTION
During the past decade, spatial databases has
received increasing interest due to its important
role in many modern applications, such as
geographic information systems (GIS),
multimedia databases, navigation systems, urban
planning, and traveler information systems. In
many real-life Location Based Services (LBSs),
the service provider employs the range queries for
only focusing specific area. However, someone
needs more specific data that includes spatial
information as well as non-spatial information.
So, we will find some objects with non-spatial
attribute in specific range. For example, Figure 1
shows that suppose Social Networking Service
(SNS) user wants to find on specific class of
people (e.g., 20≤ Age ≤ 30 and gender = female)
in queryRange. In this case, the service provider
should report the results (e.g., q2 and q3) to the
user. We will use the MapReduce [1] for
distributed computing in whole process.
MapReduce [1] is a programming model and an
associated implementation for processing and
generating large datasets that is amenable to a
broad variety of real-world tasks. Users specify
the computation in terms of a map and a reduce
function, and the underlying runtime system
automatically parallelizes the computation across
large-scale clusters of machines, handles machine
failures, and schedules inter-machine
communication to make efficient use of the
network and disks.
q
1
( 25, male)
q
2
( 25, female)
q
6
( 35, male)
q
3
( 21, female)
q
7
( 23, male)
q
12
( 21, male)
q
11
( 27, male)
q
4
( 28, female)
q
5
( 21, female)
q
8
( 18, male)
q
9
( 21, male)
q
10
( 28, male)
queryRange
Figure 1 An example of searching the non-spatial
information in queryRange
To solve the problem, we propose a range search
algorithm using Hilbert R-tree [2]. Hilbert R-tree
is a variant of R-tree that utilizes Hilbert curve, a
space curving shown to preserve spatial locality
most effectively, to guide data insertion. The tree
nodes are sorted by the Hilbert value. Each query
ISBN: 978-0-9891305-4-7 ©2014 SDIWC 159