Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing
journal homepage: www.elsevier.com/locate/isprsjprs
Exploration of OpenStreetMap missing built-up areas using twitter
hierarchical clustering and deep learning in Mozambique
Hao Li
a,
⁎
, Benjamin Herfort
a
, Wei Huang
b,c
, Mohammed Zia
a
, Alexander Zipf
a
a
GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany
b
Ministry of Transportation Ontario, Toronto, Ontario, Canada
c
Department of Civil Engineering, Ryerson University, Toronto, ON, Canada
ARTICLEINFO
Keywords:
Volunteered geographical information
OpenStreetMap
Data quality
Twitter
Hierarchical DBSCAN
Deep learning
Humanitarian mapping
ABSTRACT
Accurate and detailed geographical information digitizing human activity patterns plays an essential role in
response to natural disasters. Volunteered geographical information, in particular OpenStreetMap (OSM), shows
great potential in providing the knowledge of human settlements to support humanitarian aid, while the
availability and quality of OSM remains a major concern. The majority of existing works in assessing OSM data
quality focus on either extrinsic or intrinsic analysis, which is insufficient to fulfill the humanitarian mapping
scenario to a certain degree. This paper aims to explore OSM missing built-up areas from an integrative per-
spective of social sensing and remote sensing. First, applying hierarchical DBSCAN clustering algorithm, the
clusters of geo-tagged tweets are generated as proxies of human active regions. Then a deep learning based
model fine-tuned on existing OSM data is proposed to further map the missing built-up areas. Hit by Cyclone Idai
and Kenneth in 2019, the Republic of Mozambique is selected as the study area to evaluate the proposed method
at a national scale. As a result, 13 OSM missing built-up areas are identified and mapped with an over 90%
overall accuracy, being competitive compared to state-of-the-art products, which confirms the effectiveness of
the proposed method.
1. Introduction
Over the last decades, Volunteered Geographic Information (VGI)
has been collected much more detailed, dynamic, and manifold than
ever before from heterogeneous data sources, such as location-based
services, global positioning systems (GPS), high-resolution earth ob-
servation data, and crowdsourced geographic information (Goodchild,
2007). OpenStreetMap (OSM) has been considered as the most active
and widely used VGI platform. However, its reliability and accessibility
remain variables due to the high diversity of volunteers’ mapping be-
havior (Barron et al., 2014). Data quality is regarded as first topic that
suggests itself to anyone knowing VGI for the very first time (Goodchild
and Glennon, 2010). Therefore, exploring the data quality and acces-
sibility of OSM data requires further research towards developing so-
phisticated methods by integrating multiple social and geographical
perspectives. Better quality-oriented awareness is of central essentiality
to improve data quality and boost data application of OSM in general.
Among the existing works on investigating the quality of OSM data,
there are mainly two streams. One common approach is to compare
OSM data with authoritative reference data sets (Fan et al., 2014;
Zielstra et al., 2013; Neis et al., 2012; Mooney and Corcoran, 2012),
which are collected by federal agencies or commercial map providers.
However, the acquisition of such reference data sets highly depends on
social-economic factors (e.g., time, costs, and human labor restrictions),
thus further limits the application of such extrinsic analysis approach.
Herein, the intrinsic data analysis has been explored by looking into the
historical data, where the intrinsic indicators show great potential to
provide alternate indicators regarding the OSM data quality (Barron
et al., 2014; Zhang et al., 2018; Jackson et al., 2013; Ostermann and
Spinsanti, 2011). Given a data-sparse scenario where most of settle-
ments and streets features are simply missing in OSM data, the estab-
lished approaches become no longer adequate due to a lack of either
reference or historical data. Therefore, robust and efficient quality in-
dicators are necessary, which should be easily generated from widely
available open geospatial data.
With the ever fast growth in the need of disaster response in
worldwide, we have witnessed the increasing demands for accurate
geographical information on the spatial distribution of human
https://doi.org/10.1016/j.isprsjprs.2020.05.007
Received 26 November 2019; Received in revised form 21 April 2020; Accepted 10 May 2020
⁎
Corresponding author.
E-mail addresses: hao.li@uni-heidelberg.de (H. Li), herfort@uni-heidelberg.de (B. Herfort), huangweibuct@gmail.com (W. Huang),
zia@uni-heidelberg.de (M. Zia), zipf@uni-heidelberg.de (A. Zipf).
ISPRS Journal of Photogrammetry and Remote Sensing 166 (2020) 41–51
0924-2716/ © 2020 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved.
T