Contents lists available at ScienceDirect ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs Exploration of OpenStreetMap missing built-up areas using twitter hierarchical clustering and deep learning in Mozambique Hao Li a, , Benjamin Herfort a , Wei Huang b,c , Mohammed Zia a , Alexander Zipf a a GIScience Chair, Institute of Geography, Heidelberg University, 69120 Heidelberg, Germany b Ministry of Transportation Ontario, Toronto, Ontario, Canada c Department of Civil Engineering, Ryerson University, Toronto, ON, Canada ARTICLEINFO Keywords: Volunteered geographical information OpenStreetMap Data quality Twitter Hierarchical DBSCAN Deep learning Humanitarian mapping ABSTRACT Accurate and detailed geographical information digitizing human activity patterns plays an essential role in response to natural disasters. Volunteered geographical information, in particular OpenStreetMap (OSM), shows great potential in providing the knowledge of human settlements to support humanitarian aid, while the availability and quality of OSM remains a major concern. The majority of existing works in assessing OSM data quality focus on either extrinsic or intrinsic analysis, which is insufficient to fulfill the humanitarian mapping scenario to a certain degree. This paper aims to explore OSM missing built-up areas from an integrative per- spective of social sensing and remote sensing. First, applying hierarchical DBSCAN clustering algorithm, the clusters of geo-tagged tweets are generated as proxies of human active regions. Then a deep learning based model fine-tuned on existing OSM data is proposed to further map the missing built-up areas. Hit by Cyclone Idai and Kenneth in 2019, the Republic of Mozambique is selected as the study area to evaluate the proposed method at a national scale. As a result, 13 OSM missing built-up areas are identified and mapped with an over 90% overall accuracy, being competitive compared to state-of-the-art products, which confirms the effectiveness of the proposed method. 1. Introduction Over the last decades, Volunteered Geographic Information (VGI) has been collected much more detailed, dynamic, and manifold than ever before from heterogeneous data sources, such as location-based services, global positioning systems (GPS), high-resolution earth ob- servation data, and crowdsourced geographic information (Goodchild, 2007). OpenStreetMap (OSM) has been considered as the most active and widely used VGI platform. However, its reliability and accessibility remain variables due to the high diversity of volunteers’ mapping be- havior (Barron et al., 2014). Data quality is regarded as first topic that suggests itself to anyone knowing VGI for the very first time (Goodchild and Glennon, 2010). Therefore, exploring the data quality and acces- sibility of OSM data requires further research towards developing so- phisticated methods by integrating multiple social and geographical perspectives. Better quality-oriented awareness is of central essentiality to improve data quality and boost data application of OSM in general. Among the existing works on investigating the quality of OSM data, there are mainly two streams. One common approach is to compare OSM data with authoritative reference data sets (Fan et al., 2014; Zielstra et al., 2013; Neis et al., 2012; Mooney and Corcoran, 2012), which are collected by federal agencies or commercial map providers. However, the acquisition of such reference data sets highly depends on social-economic factors (e.g., time, costs, and human labor restrictions), thus further limits the application of such extrinsic analysis approach. Herein, the intrinsic data analysis has been explored by looking into the historical data, where the intrinsic indicators show great potential to provide alternate indicators regarding the OSM data quality (Barron et al., 2014; Zhang et al., 2018; Jackson et al., 2013; Ostermann and Spinsanti, 2011). Given a data-sparse scenario where most of settle- ments and streets features are simply missing in OSM data, the estab- lished approaches become no longer adequate due to a lack of either reference or historical data. Therefore, robust and efficient quality in- dicators are necessary, which should be easily generated from widely available open geospatial data. With the ever fast growth in the need of disaster response in worldwide, we have witnessed the increasing demands for accurate geographical information on the spatial distribution of human https://doi.org/10.1016/j.isprsjprs.2020.05.007 Received 26 November 2019; Received in revised form 21 April 2020; Accepted 10 May 2020 Corresponding author. E-mail addresses: hao.li@uni-heidelberg.de (H. Li), herfort@uni-heidelberg.de (B. Herfort), huangweibuct@gmail.com (W. Huang), zia@uni-heidelberg.de (M. Zia), zipf@uni-heidelberg.de (A. Zipf). ISPRS Journal of Photogrammetry and Remote Sensing 166 (2020) 41–51 0924-2716/ © 2020 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS). Published by Elsevier B.V. All rights reserved. T