International Journal of Computing and Digital Systems ISSN (2210-142X) Int. J. Com. Dig. Sys. 14, No.1 (Sep-2023) http://dx.doi.org/10.12785/ijcds/140174 Disaster Tweet Classifications Using Hybrid Convolutional Layers and Gated Recurrent Unit Ricko Anugrah Mulya Pratama 1 and Hilman Ferdinandus Pardede 1,2 1 Graduate School of Computer Science, Faculty of Information Technology, Nusa Mandiri University, Jakarta, Indonesia 1,2 Research Center for Artificial Intelligence and Cyber Security, National Research and Innovation Agency, Bandung, Indonesia Received 02 Mar. 2023, Revised 22 May 2023, Accepted 31 Jul. 2023, Published 01 Sep. 2023 Abstract: A disaster monitoring system using Twitter data can provide information regarding disaster-prone areas and emergency response information. There have been several studies aimed at applying machine learning technologies to automatically detect disaster information from Twitter data. The Support Vector Machine (SVM) is one of the frequently used algorithms for text categorization situations, but SVM for text classification is limited by drawbacks transparency in the results caused by the high number of dimensions. Long Short-Term Memory (LSTM) is another deep learning technique that is frequently employed for text categorization, but the LSTM processing process uses quite long stages so that it requires longer computation time. The main idea in proposing this hybrid model is to combine the advantages of a highly reliable Convolutional Neural Network (CNN) architecture for handling high-dimensional data and Gated Recurrent Units (GRU) which are eective in processing sequential data and have faster computation time compared to LSTM. This study uses NLP Disaster Tweets dataset from Kaggle. The suggested model outperforms at least 12 dierent categories of conventional machine learning algorithms as well as other widely used deep learning models in terms of performance. The CNN-GRU hybrid model with FastText produces an accuracy of 83.32%, F1-score of 81.45%, and an AUC of 83.45%. Keywords: Disaster Tweets, Classification, Hybrid CNN-GRU, Deep Learning, Natural Language Processing 1. INTRODUCTION Twitter is a platform where opinions can be obtained on almost every subject. One of the most widely obtained information from Twitter is news about disasters, which allows the public to share disaster information in real time with the public [1]. A disaster monitoring system using Twitter data can provide information regarding disaster- prone areas and emergency response information. The occurrence of disasters is not only caused by natural disasters, but also many are caused by human negligence. The eect of a catastrophe can include human casualties, property loss, social and economic disruption, and environ- mental harm, according to the United Nations International Strategy for catastrophe Reduction (UNISDR) [2]. Because of this, a lot of news organizations and disaster aid groups are interested in automating the monitoring of catastrophe information on Twitter. To make this happen, a machine learning algorithm is needed that can automatically identify text from Twitter that can recognize disaster-related contexts or not. The main idea of this research is to create a machine learning model that can analyze text on Twitter that refers to disasters (fires, earthquakes, etc.), to help mobilize emergency response teams quickly. The challenge in this research is that every tweet has a pattern, long sentences, and content that is not just content text data, but can also contain a wide variety of photos, videos, or web links. The next challenge is how well machine learning models can classify a valid disaster tweet, just humor, or just a metaphor. Computer algorithms cannot recognize raw text, Natural Language Processing (NLP) is needed to help computers understand natural human language [3]. Thusly, machine learning can learn patterns of textual data. The classification case encountered in this study is binary classification, the model is designed to sort out tweet information related to disaster or not. In the case of binary classification, many studies have tried dierent types of classifiers. One of the commonly used classifiers is Support Vector Machine (SVM). SVM has been widely used for various text binary classi- fication studies, because SVM by nature is binary classifier [4]. SVM is very reliable in finding the best hyperplane by maximizing the distance between classes. However, SVM for text classification is limited by drawbacks transparency in the results caused by the high number of dimensions [5]. In addition, the deep learning approach that is commonly used for text classification is Long Short-Term Memory (LSTM). LSTM is able to learn and considering long-term dependability [6]. However, LSTM uses long stages so it requires a longer computation time [7]. This computational E-mail: 14207093@nusamandiri.ac.id, hilm003@brin.go.id https:// journal.uob.edu.bh