INFORMATION EXTRACTION METHODOLOGY BY WEB
SCRAPING FOR SMART CITIES
Using Machine Learning to Train Air Quality Monitor for Smart Cities
CHIA-CHUN CHUNG
1
and TAY-SHENG JENG
2
1,2
Department of Architecture, National Cheng Kung University,
Taiwan
1
chungamy0117@gmail.com
2
tsjeng@mail.ncku.edu.tw
Abstract. This paper presents an opportunistic sensing system
for air quality monitoring to forecast the implicit factors of air
pollution. Opportunistic sensing is performed by web scraping in
the social network service to extract information. The data source
for the air quality analysis combines two types of information:
explicit and implicit information. The objective is to develop the
information extraction methodology by web scraping for smart cities.
The application development methodology has potential for solving
real-world problems such as air pollution by data comparison between
social activity observing and data collecting in sensor network.
Keywords. Smart city; open data; web scraping; social media;
machine learning.
1. Introduction
With the continuously increasing urbanization, an emerging problem is the
lack of data, resources, and technical capability to manage urban crises such
as energy, health, transportation, water and sanitation. Traditionally, the
urban environmental problems have been primarily resolved through professional
expertise, which results in the formulation of strategic policies. To resolve the
urban crises, some governments are taking an extra step to make a city “smarter”.
A particular strategy of a smart city is to use open data with the help of information
technology to respond to citizen demands. For example, in Chicago, Tweet
sentiments is used for recommending which road issafer for walkers (Kim, Cha &
Sandholm, 2014). In this paper, our focus is to support the function of air quality
monitoring in smart cities. The information extraction from social media is based
on IoT sensor networks in smart cities, as shown in Figure 1.
T. Fukuda, W. Huang, P. Janssen, K. Crolla, S. Alhadidi (eds.), Learning, Adapting and Prototyping,
Proceedings of the 23
rd
International Conference of the Association for Computer-Aided Architectural
Design Research in Asia (CAADRIA) 2018, Volume 2, 515-524. © 2018 and published by the Association
for Computer-Aided Architectural Design Research in Asia (CAADRIA) in Hong Kong.