INFORMATION EXTRACTION METHODOLOGY BY WEB SCRAPING FOR SMART CITIES Using Machine Learning to Train Air Quality Monitor for Smart Cities CHIA-CHUN CHUNG 1 and TAY-SHENG JENG 2 1,2 Department of Architecture, National Cheng Kung University, Taiwan 1 chungamy0117@gmail.com 2 tsjeng@mail.ncku.edu.tw Abstract. This paper presents an opportunistic sensing system for air quality monitoring to forecast the implicit factors of air pollution. Opportunistic sensing is performed by web scraping in the social network service to extract information. The data source for the air quality analysis combines two types of information: explicit and implicit information. The objective is to develop the information extraction methodology by web scraping for smart cities. The application development methodology has potential for solving real-world problems such as air pollution by data comparison between social activity observing and data collecting in sensor network. Keywords. Smart city; open data; web scraping; social media; machine learning. 1. Introduction With the continuously increasing urbanization, an emerging problem is the lack of data, resources, and technical capability to manage urban crises such as energy, health, transportation, water and sanitation. Traditionally, the urban environmental problems have been primarily resolved through professional expertise, which results in the formulation of strategic policies. To resolve the urban crises, some governments are taking an extra step to make a city “smarter”. A particular strategy of a smart city is to use open data with the help of information technology to respond to citizen demands. For example, in Chicago, Tweet sentiments is used for recommending which road issafer for walkers (Kim, Cha & Sandholm, 2014). In this paper, our focus is to support the function of air quality monitoring in smart cities. The information extraction from social media is based on IoT sensor networks in smart cities, as shown in Figure 1. T. Fukuda, W. Huang, P. Janssen, K. Crolla, S. Alhadidi (eds.), Learning, Adapting and Prototyping, Proceedings of the 23 rd International Conference of the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) 2018, Volume 2, 515-524. © 2018 and published by the Association for Computer-Aided Architectural Design Research in Asia (CAADRIA) in Hong Kong.