Inferring building functions from a probabilistic model using public transportation data Chen Zhong a, , Xianfeng Huang a,b , Stefan Müller Arisona c , Gerhard Schmitt a , Michael Batty d a Future Cities Laboratory, Department of Architecture, ETH Zurich, 8092 Zurich, Switzerland b State Key Lab of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, 430079 Wuhan, China c Institute of 4D Technologies, University of Applied Sciences and Arts Northwestern Switzerland FHNW, 5210 Windisch, Switzerland d Centre for Advanced Spatial Analysis, University College London, 90 Tottenham Court Road, W1N 6TR London, England, United Kingdom article info Article history: Received 19 March 2013 Received in revised form 14 July 2014 Accepted 16 July 2014 Keywords: Bayesian model Spatial statistics Building function Activity Smart card data abstract Cities are complex systems. They contain different functional areas originally defined by planning and then reshaped by actual needs and use by the inhabitants. Estimating the functions of urban space is of significant importance for detecting urban problems, evaluating planning strategies, and supporting policy making. In light of the potential of data mining and spatial analysis techniques for urban analysis, this paper proposes a method to infer urban functions at the building level using transportation data obtained from surveys and smart card systems. Specifically, we establish a two-step framework making use of the spatial relationships between trips, stops, and buildings. Firstly, information about the travel purposes for daily activities is deduced using passengers’ mobility patterns based on a probabilistic Bayesian model. Secondly, building functions are inferred by linking daily activities to the buildings sur- rounding the stops based on spatial statistics. We demonstrate the proposed method using large-scale public transportation data from two areas of Singapore. Our method is applied to identify building functions at building level. The result is verified with master plan, street view, and investigated data, and limitations are identified. Our work shows that the presented method is applicable in practice with a good accuracy. In a broader context, it shows the effectiveness of applying integrated techniques to combine multi-source data in order to make insights about social activities and complex urban space. Ó 2014 Elsevier Ltd. All rights reserved. 1. Introduction Urban systems are composed of many different forms of func- tional areas, which interact with one another to generate the com- plexity that defines a city. These functional areas are historically associated with many urban processes, some related to the institu- tions that are used to support planning but most being shaped by individuals’ actual needs through processes of bottom-up change. In this spirit, Jane (1961) described cities as ‘problems of organized complexity’. Taking a small park as an example, she argued that ‘‘... even this partial influence of the park’s design upon the park’s use depends, in turn, on who is around to use the park and when, and this in turn depends on uses of the city outside the park itself...’’. Similarly, in the book by Rodrigue (2013), land uses are defined in two ways. Formal land use refers to its form, pattern, and aspect, while functional land use refers to its socioeconomic description in space. The latter aspect is likely to imply higher levels of dynamic temporal change compared to the former as activities change faster than the physical locations and land uses that contain them. As discussed in Green (2007), functional changes in cities are not tied to morphological changes. In places such as Singapore, it is crucial to understand urban functions and their compatibility with the original Master Plan, which is very important to the development of the urban system, and the current push in understanding the dynamics of urban areas requires costly cross-sectional survey data, which in principle should be used to dynamically update information. As a potential solution to these problems, only recently has the availability of multiple location data sources, such as GSM traces, Wi-Fi data, GPS traces from taxis and smart-card data, emerged, and this is, for the first time, greatly stimulating the use of these ‘‘big’’ data sets for urban analysis. As it implies in Yuan, Zheng, and Xie (2012) that regions of different functions in a city can be detected using human mobility data and points of interest data. In Roth and et al. (2011), the character- istics of a polycentric urban form are defined from the analysis of large-scale, real-time smart-card data from which individuals’ movement patterns can be inferred. http://dx.doi.org/10.1016/j.compenvurbsys.2014.07.004 0198-9715/Ó 2014 Elsevier Ltd. All rights reserved. Corresponding author. Computers, Environment and Urban Systems 48 (2014) 124–137 Contents lists available at ScienceDirect Computers, Environment and Urban Systems journal homepage: www.elsevier.com/locate/compenvurbsys