Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data Enrico Steiger , René Westerholt, Bernd Resch, Alexander Zipf GIScience Research Group, Institute of Geography, Heidelberg University, Germany abstract article info Article history: Received 2 October 2014 Received in revised form 8 September 2015 Accepted 17 September 2015 Available online xxxx Keywords: Crowdsourcing of human activities LBSN Twitter Spatial autocorrelation Semantic topic modeling Detailed knowledge regarding the whereabouts of people and their social activities in urban areas with high spa- tial and temporal resolution is still widely unexplored. Thus, the spatiotemporal analysis of Location Based Social Networks (LBSN) has great potential regarding the ability to sense spatial processes and to gain knowledge about urban dynamics, especially with respect to collective human mobility behavior. The objective of this paper is to explore the semantic association between georeferenced tweets and their respective spatiotemporal where- abouts. We apply a semantic topic model classication and spatial autocorrelation analysis to detect tweets indi- cating specic human social activities. We correlated observed tweet patterns with ofcial census data for the case study of London in order to underline the signicance and reliability of Twitter data. Our empirical results of semantic and spatiotemporal clustered tweets show an overall strong positive correlation in comparison with workplace population census data, being a good indicator and representative proxy for analyzing workplace-based activities. © 2015 Elsevier Ltd. All rights reserved. 1. Introduction Cities are multifunctional complex systems serving as major hubs for a number of human social activities. With more than half of the world's population living in urban areas and a continuing urban growth (United Nations Population Fund, 2008), the capability to provide viable service infrastructure (roads, public transport, energy supplies, etc.) for the ma- jority of people is a rising challenge. The characterization of urban struc- tures can facilitate urban and transportation planning processes providing valuable information, which helps to predict the increased pressure on existing urban infrastructures. Regular commuting from workplaces to places of residence, and activities originating from these areas, are major examples of daily routines within urban areas, inuenc- ing human mobility and affecting transportation planning. In the UK in 2013, a person on average made 145 trips with 19% of all trip purposes related to business and commuting activities (Department for Transport, 2014). Determining the frequency and spatial distribution of travel origins and destinations for every trip purpose is a principal quantitative study area currently carried out by mobility surveys (Morris, Humphrey, & Tipping, 2014). However, they are expensive in terms of the required labor input and usually lead to limited sample sizes. Thus, the investigation of typically larger spatiotemporal human activity clusters obtained from crowdsourced information may help to understand commuting patterns and reveal specic urban structures such as workplace concentrations. In this context, emerging, inexpensive and widespread sensor tech- nologies have created new possibilities to infer mobility data for explor- ing urban structures and dynamics. This growing availability of mobile devices equipped with GPS sensors having broadband internet access, allows users to actively participate and create content through mobile applications and location-based services (ITU, 2014). Particularly georeferenced Twitter data is a promising opportunity to understand geographic processes inside online social networks. The enormous potential of interactive social media platforms like Twitter has been increasingly recognized by numerous research domains over the last years. Although there is a growing research body using Twitter data to analyze urban processes, empirical research towards the valida- tion of human social activities revealing urban structures and human mobility patterns using crowdsourced information is still widely unex- plored (Resch, Beinat, Zipf, & Boher, 2012). In a previous study we introduced a semantic and spatial analysis method, through which we were able to extract human mobility ows from uncertain Twitter data (Steiger, Ellersiek, & Zipf, 2014). However, it remains to be investigated whether we can nd similar semantic layers that represent collective human behavior in co-occurrence with underlying social activity. Therefore, research question (RQ1) investigates the possibility of ex- ploring urban structures through characterizing spatiotemporal and se- mantic patterns of human social activities. Hence, we extract topics covering work-related and home-related activities that reect typical collective human behavior (e.g., city-scale human mobility). Thus, the Computers, Environment and Urban Systems 54 (2015) 255265 Corresponding author at: Institute of Geography, Heidelberg University, Berliner Straße 48, D-69120 Heidelberg, Germany. E-mail address: enrico.steiger@geog.uni-heidelberg.de (E. Steiger). http://dx.doi.org/10.1016/j.compenvurbsys.2015.09.007 0198-9715/© 2015 Elsevier Ltd. All rights reserved. Contents lists available at ScienceDirect Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/ceus