Scalable Spatio-temporal Indexing and Qerying over a
Document-oriented NoSQL Store
Nikolaos Koutroumanis
Deptartment of Digital Systems
University of Piraeus
Piraeus, Greece
koutroumanis@unipi.gr
Christos Doulkeridis
Deptartment of Digital Systems
University of Piraeus
Piraeus, Greece
cdoulk@unipi.gr
ABSTRACT
In this paper, we provide an in-depth study of the performance
of spatio-temporal queries in document-oriented NoSQL stores.
Existing NoSQL stores provide limited support for spatial data
and (quite often) no native support for spatio-temporal data. As
a result, the performance of query execution over large collec-
tions of spatio-temporal data is often suboptimal. We present
an approach for indexing spatio-temporal data, which is applica-
ble to any NoSQL store that provides key-based access to data
without modifcations to its code, and we show how to generate
data partitions that preserve data locality. Moreover, we show
the impact of indexing and partitioning on the number of cluster
nodes that serve a query, and we discuss the advantages and dis-
advantages for diferent applications. We adopt a methodology
for the evaluation of spatio-temporal range queries, which can
serve as a benchmark. In our experiments, we focus on MongoDB
(as a representative NoSQL store that provides spatial support)
and we study the impact of indexing spatio-temporal data on
performance, using both real-life and synthetic data sets in a
medium-sized cluster.
1 INTRODUCTION
Big spatio-temporal data sets are collected every day at unprece-
dented rates [15, 17], due to emergent applications, such as feet
management solutions, surveillance systems in maritime and
aviation, human and animal tracking, IoT sensor feeds, location-
based web search, and social networks with geotagged content.
These applications generate huge volumes of positional infor-
mation represented as points, which require scalable storage
and retrieval, so that data analysis techniques can be applied
to discover hidden spatio-temporal patterns. As a result, scal-
able spatio-temporal data management is a challenging research
topic, and efcient solutions are required for storage, indexing
and querying.
NoSQL stores [4, 7] comprise the state-of-the-art in scalable
storage to date. However, while support for spatial data is pro-
vided recently by an increasing number of NoSQL stores, this
is seldom the case for spatio-temporal data. In fact, even spa-
tial data access methods are not always optimized in today’s
mainstream NoSQL stores. While most relational DBMSs have
adopted R-trees [11] (or its variants [2, 16]) for efcient spatial
indexing, NoSQL stores with spatial support adopt GeoHashes to
map spatial data to one-dimensional (1D) values, which is then
indexed using traditional 1D indexes, such as B-trees [6] (see
Table 1). Our conjecture is that this decision relates to the cost of
building and maintaining a distributed R-tree. Consequently, the
© 2021 Copyright held by the owner/author(s). Published in Proceedings of the
24th International Conference on Extending Database Technology (EDBT), March
23-26, 2021, ISBN 978-3-89318-084-4 on OpenProceedings.org.
Distribution of this paper is permitted under the terms of the Creative Commons
license CC-by-nc-nd 4.0.
Database Spatial Indexing
RDBMS
PostgreSQL (PostGIS extension) R-Tree
MySQL R-tree
Oracle R-tree
MariaDB R-tree
SQL Server B-tree
SQLite (SpatiaLite extension) B-tree
NoSQL
MongoDB B-tree
Redis (Geo API) Sorted Set
DynamoDB B-tree
Elasticsearch BKD-tree
Neo4J B+Tree
Table 1: Spatial support in most popular relational and
NoSQL data stores
performance of existing solutions is suboptimal, when faced with
the challenge of efcient and scalable retrieval of spatio-temporal
data.
Our work is motivated by real-life applications, revolving
around feet management operators in the urban domain, which
collect large volumes of positional data from GPS-equipped vehi-
cles daily. The specifc use-cases that are supported by our work
relate to exploratory analysis of historical routes, using multiple
spatio-temporal queries of varying granularity. The retrieved
trajectories are analyzed for feet cost reduction (by analyzing
the fuel consumption of historical routes), intelligent routing, as
well as for discovering movement patterns. The challenge is to
provide a scalable storage and spatio-temporal querying solution
for large volumes of historical mobility data. Unfortunately, ex-
isting industrial solutions are not optimized for spatio-temporal
querying at scale, thus feet management operators apply data
analysis techniques only on recent subsets of their historical
database, while older data is kept in cold storage.
Motivated by these limitations, in this application paper, we
provide an in-depth study of querying spatio-temporal data at
scale, focusing on a document-oriented NoSQL store, namely
MongoDB. The choice of MongoDB is justifed due to its wide
popularity among big data developers, and its maturity compared
to other competitive technologies. We explain the internal de-
tails of indexing and sharding, focusing on how spatial data is
supported, and eventually design a solution for spatio-temporal
data using the built-in indexes of MongoDB. Then, we propose
an alternative approach that uses the Hilbert space-flling curve
(which has been shown to have nice clustering properties [14])
to generate one-dimensional (1D) keys, which facilitates index-
ing of spatio-temporal data, and allows to preserve data locality
in the nodes of the MongoDB cluster. Moreover, this approach
can be implemented on top of MongoDB (and other key-based
NoSQL stores), thus being directly applicable for any application.
Industrial Paper
Series ISSN: 2367-2005 611 10.5441/002/edbt.2021.71