Real-Time Social Network Data Mining for Predicting the Path for a Disaster Saloni Jain, Brett Adams Duncan, and Yanqing Zhang Department of Computer Science, Georgia State University, Atlanta, USA Email: {sjain5, bduncan7}@student.gsu.edu, yzhang@gsu.edu Ning Zhong Department of Life Science and Informatics, Maebashi Institute of Technology, Maebashi-City, Japan Email: zhong@maebashi-it.ac.jp Zejin Ding Hewlett-Packard Company, 5555 Windward Pkwy, Alpharetta, USA Email: dingzejin@gmail.com AbstractTraditional communication channels like news channels are not able to provide spontaneous information about disasters unlike social networks, namely, Twitter. This work proposes a framework by mining real-time disaster data from Twitter to predict the path; a disaster like a tornado will take. The users of Twitter act as the sensors, which provide useful information about the disaster by posting first-hand experience, warnings or location of a disaster. The steps involved in the framework are data collection, data preprocessing, geo-location tagging data filtering and extrapolation of the disaster curve for prediction of susceptible locations. The framework is validated by analyzing the past events using regression with the government warnings. This framework has the potential to be developed into a full-fledged system to provide instantaneous warnings to people about disasters via news channels or broadcasts. Index Termsdata mining, disaster computing, real-time disaster prediction, regression I. INTRODUCTION Social Media has become a very important tool to stay in touch with friends, to market products and services offered by companies and even to make announcements by government agencies and news channels. One of the social networking websites which has gained vast popularity is Twitter. This research work deals with the data obtained from Twitter, which is mined for getting useful information for a real-world scenario, mainly, disaster path prediction. It is discussed in the next section. A. Twitter and Its Importance Twitter is an Online Social Network (OSN) used by millions of people all over the world. It enables people to stay connected with their friends, family and colleagues. With advancement in technology, it has become easier to access Twitter using mobile devices like iPhones and Manuscript received July 23, 2015; revised November 11, 2015. iPads. Currently, Twitter has 288 million monthly active users with an average of 500 million tweets being sent per day [1]. Twitter has become an important resource for the field of Data Mining because of its many features. It has a varied variety of users, which can represent a sample of the entire population. The revolution of Information and billions of people to access social networking sites ensuring that they have a wide reach of people. They can post messages on the go which ensures the real-time nature of the messages. Compared to emails, this “push” of information is almost instantaneous. Twitter also has a feature of searching or filtering messages which are interesting to a user using hashtags. Users have the freedom to follow or join groups that they like. It also caters for security for its users, where they can decide to post tweets publicly or privately. Mostly, people post their trivial personal experiences but sometimes they post messages which contain information that are valuable on mining. This information can be about events like politics, traffic jams, riots, fires, earthquakes, storms, etc. Therefore, Twitter can also act as a non-traditional medium to obtain news as people can tweet information which is newsworthy. They can even create messages with news value, which can be used in early warning detection systems. However, the most important feature for this study is the real-time nature of the information dissipation in the Twitter network. It further becomes useful when 80 per cent of the users are mobile users [1] which can provide us with exact geo- location and more up-to-date information. B. Data Mining for Disaster Management Data Mining plays a crucial role in extracting useful information from Social Media. The reason is because information in social media contains personal trivial data which is not very enlightening or useful to a large group of people. It is used in many areas for analysis. For 81 Journal of Advances in Information Technology Vol. 7, No. 2, May 2016 © 2016 J. Adv. Inf. Technol. doi: 10.12720/jait.7.2.81-87 Communication Technology (ICT) has made it possible for