Real-Time Social Network Data Mining for
Predicting the Path for a Disaster
Saloni Jain, Brett Adams Duncan, and Yanqing Zhang
Department of Computer Science, Georgia State University, Atlanta, USA
Email: {sjain5, bduncan7}@student.gsu.edu, yzhang@gsu.edu
Ning Zhong
Department of Life Science and Informatics, Maebashi Institute of Technology, Maebashi-City, Japan
Email: zhong@maebashi-it.ac.jp
Zejin Ding
Hewlett-Packard Company, 5555 Windward Pkwy, Alpharetta, USA
Email: dingzejin@gmail.com
Abstract—Traditional communication channels like news
channels are not able to provide spontaneous information
about disasters unlike social networks, namely, Twitter.
This work proposes a framework by mining real-time
disaster data from Twitter to predict the path; a disaster
like a tornado will take. The users of Twitter act as the
sensors, which provide useful information about the disaster
by posting first-hand experience, warnings or location of a
disaster. The steps involved in the framework are – data
collection, data preprocessing, geo-location tagging data
filtering and extrapolation of the disaster curve for
prediction of susceptible locations. The framework is
validated by analyzing the past events using regression with
the government warnings. This framework has the potential
to be developed into a full-fledged system to provide
instantaneous warnings to people about disasters via news
channels or broadcasts.
Index Terms—data mining, disaster computing, real-time
disaster prediction, regression
I. INTRODUCTION
Social Media has become a very important tool to stay
in touch with friends, to market products and services
offered by companies and even to make announcements
by government agencies and news channels. One of the
social networking websites which has gained vast
popularity is Twitter. This research work deals with the
data obtained from Twitter, which is mined for getting
useful information for a real-world scenario, mainly,
disaster path prediction. It is discussed in the next section.
A. Twitter and Its Importance
Twitter is an Online Social Network (OSN) used by
millions of people all over the world. It enables people to
stay connected with their friends, family and colleagues.
With advancement in technology, it has become easier to
access Twitter using mobile devices like iPhones and
Manuscript received July 23, 2015; revised November 11, 2015.
iPads. Currently, Twitter has 288 million monthly active
users with an average of 500 million tweets being sent
per day [1].
Twitter has become an important resource for the field
of Data Mining because of its many features. It has a
varied variety of users, which can represent a sample of
the entire population. The revolution of Information and
billions of people to access social networking sites
ensuring that they have a wide reach of people. They can
post messages on the go which ensures the real-time
nature of the messages. Compared to emails, this “push”
of information is almost instantaneous. Twitter also has a
feature of searching or filtering messages which are
interesting to a user using hashtags. Users have the
freedom to follow or join groups that they like. It also
caters for security for its users, where they can decide to
post tweets publicly or privately.
Mostly, people post their trivial personal experiences
but sometimes they post messages which contain
information that are valuable on mining. This information
can be about events like politics, traffic jams, riots, fires,
earthquakes, storms, etc. Therefore, Twitter can also act
as a non-traditional medium to obtain news as people can
tweet information which is newsworthy. They can even
create messages with news value, which can be used in
early warning detection systems. However, the most
important feature for this study is the real-time nature of
the information dissipation in the Twitter network. It
further becomes useful when 80 per cent of the users are
mobile users [1] which can provide us with exact geo-
location and more up-to-date information.
B. Data Mining for Disaster Management
Data Mining plays a crucial role in extracting useful
information from Social Media. The reason is because
information in social media contains personal trivial data
which is not very enlightening or useful to a large group
of people. It is used in many areas for analysis. For
81
Journal of Advances in Information Technology Vol. 7, No. 2, May 2016
© 2016 J. Adv. Inf. Technol.
doi: 10.12720/jait.7.2.81-87
Communication Technology (ICT) has made it possible for