Risdianto Irawan et al., International Journal of Emerging Trends in Engineering Research, 8(7), July 2020, 3216 - 3222 3216 ABSTRACT Natural disasters are expected to increase in number and severity on a global scale. Social media analysis has become an essential tool in natural disaster management on tracking disaster events, impact, and other relevant critical information. However, the high volume of tweets data produces noise, and not all tweets are relevant to gain situational awareness for disaster response. This paper presents a disaster-relevance classification for Indonesian language tweets using machine learning with Naïve Bayes, support vector machine, and logistic regression with a focus on twitter data generated during the Sulawesi Earthquake, Indonesia 2018. With the result accuracy of 83.5%, our labeled data can be used for capturing disaster-relevant tweets in any future disaster event in Indonesia. Key words :Social Media Analysis, Text Mining, Disaster-Relevance Classification, Indonesia 1. INTRODUCTION Every year natural disaster events, such as typhoons, floods, landslides, volcanic eruptions, and earthquakes, cause thousands of death tolls, billions of dollars of property damage, and severe impact on the environment [1]. Indonesia is among the top five risk countries in the Asia Pacific region with a high possibility for the next catastrophic natural disaster. On 28 September 2018, a shallow, M7.5 earthquake hit Sulawesi, Indonesia, with its epicenter located in Donggala Regency, Central Sulawesi. It was followed by a localized tsunami and destroyed many buildings close to the coastal area. The effects of the earthquake and tsunami caused the deaths of an estimated 4,340 people. In every single disaster event, it is necessary to get information about the damage, impact, and needs as quickly as possible. Government, disaster management authority, and humanitarian actors can act and respond better by knowing what people need, and what is happening in the affected area. Social media has been used and utilized for many domains, including natural disasters [2] and epidemic disease outbreak [3]. Social media provide information for decision-making and as source information before, during, and after big events, including natural disasters. Twitter, as a social sensor, provides data and key information during a disaster. Twitter users can actively and immediately express their opinion about what they feel in twitter [4]. It allows users to share and express a short message through their tweets about the real situation on and from the affected area during the disaster. In countries with good telecommunications infrastructures such as Indonesia, social media play a significant role as a platform for sharing news and information. For instance, during the Sulawesi Earthquake 2018, hashtag #gempapalu, #prayforpalu, and #prayforindonesia were on the top list of twitter trending topics. It showed that communication and information distribution significantly happened during the Sulawesi Earthquake. However, twitter data has many noises, and there are many irrelevant tweets for every single event, including natural disasters. Most of the tweets are not useful in providing information about the disaster. During a disaster, we may have a huge number of tweets twitting about the particular disaster but irrelevant, for instance, tweets using the same hashtag for promoting or advertising irrelevant content. Therefore, it is necessary to filter and classify disaster-relevant tweets automatically and accurately to enable quick data-driven decision-making. Many disaster relevance classification studies have given great contributions to disaster management by resulting and providing disaster lexicon to be used for further research or applied to real disaster management. Related studies on using social media data for disaster analysis mostly performed to classify disaster relevancy for the English language, where the Indonesian language has a different lexicon. In this paper, we analyzed and labeled tweets manually as disaster-relevant and disaster-irrelevant. The manual labeling was done by disaster practitioners from caribencana.id. The labeled tweets were used as training data on machine learning using several algorithms. The best accuracy model was then used for labeling the other remaining tweets. The following contributions are made in this research: (1) Identify the best machine learning algorithm for classifying disaster relevance for Indonesian language tweets, (2) Identify relevant tweets during disaster response in Indonesia and extract some key information, (3) Provide disaster relevancy tweets data for future research and future disaster event in Indonesia by Social Media Disaster Relevance Classification for Situation Awareness during Emergency Response in Indonesia Risdianto Irawan 1 , Sani M Isa 2 1 Computer Science Department, BINUS Graduate Program - Master of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia, risdianto.irawan@binus.ac.id 2 Computer Science Department, BINUS Graduate Program - Master of Computer Science, Bina Nusantara University, Jakarta, 11480, Indonesia, sani.m.isa@binus.ac.id ISSN 2347 - 3983 Volume 8. No. 7, July 2020 International Journal of Emerging Trends in Engineering Research Available Online at http://www.warse.org/IJETER/static/pdf/file/ijeter55872020.pdf https://doi.org/10.30534/ijeter/2020/55872020