(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 Sarcasm Detection in Tweets: A Feature-based Approach using Supervised Machine Learning Models Arifur Rahaman 1 , Ratnadip Kuri 2 , Syful Islam 3 *, Md. Javed Hossain 4 , Mohammed Humayun Kabir 5 Dept. of Computer Science and Telecommunication Engineering Noakhali Science and Technology University, Bangladesh 1,2,3,4,5 Dept. of Computer Science and Engineering, Sonargaon University, Bangladesh 1 Abstract—Sarcasm (i.e., the use of irony to mock or convey contempt) detection in tweets and other social media platforms is one of the problems facing the regulation and moderation of social media content. Sarcasm is difficult to detect, even for humans, due to the deliberate ambiguity in using words. Existing approaches to automatic sarcasm detection primarily rely on lexical and linguistic cues. However, these approaches have produced little or no significant improvement in terms of the accuracy of sentiment. We propose implementing a robust and efficient system to detect sarcasm to improve accuracy for sentiment analysis. In this study, four sets of features include various types of sarcasm commonly used in social media. These feature sets are used to classify tweets into sarcastic and non- sarcastic. This study reveals a sarcastic feature set with an effective supervised machine learning model, leading to better accuracy. Results show that Decision Tree (91.84%) and Random Forest (91.90%) outperform in terms of accuracy compared to other supervised machine learning algorithms for the right features selection. The paper has highlighted the suitable supervised machine learning models along with its appropriate feature set for detecting sarcasm in tweets. Keywords—Machine learning; detection; sarcasm; sentiment; tweets I. INTRODUCTION Sarcasm detection in opinion mining is an essential tool with various applications, including health, security, and sales [1, 2]. Several organizations and companies have shown their interest in studying tweets data to know people's opinion towards popular products, political events or movies. Millions of tweets are posted every day, which increase the content of twitter tremendously. However, microblogging social media (i.e., maximum 140 characters in every tweet) and containing informal language essentially makes it quite tricky to understand users' sentiment and perform sentiment analysis. Additionally, sarcasm poses a challenge in sentiment analysis and causes the misclassification of people's opinions. Hence, it leads to reduced accuracy of sentiment analysis. People use sarcasm to mock or convey contempt through a sentence or while speaking. People apply positive words to reveal gloomy feelings. In recent days, sarcasm or irony is very common in social media, although it is challenging to detect. The cutting-edge approaches of opinion mining and sentiment analysis are prone to unsatisfactory performances while analyzing social media data. Maynard and others [3] proposed that detecting sarcasm during sentiment analysis might significantly improve performance. Consequently, the necessity for an effective method to identify sarcasm arises. In this paper, we propose an effective method to identify a sarcastic tweet. Our strategy considers the various types of sarcasm indicating features such as Lexical-based Features, Sarcastic-based Features, Contrast-based Features, Context- based Features, and detects the sarcastic tweets using multiple supervised machine learning models based on extracted features. We suggest an effective machine learning model and feature set to better perform sarcasm detection in sentiment analysis and get better accuracy, which is explained at the end of the evaluation and the result analysis parts of this paper. The main contributions are as follows: 1) To detect sarcasm, we use a set of machine learning classification algorithms along with a variety of features to identify the best classifier model with significant features, which leads to recognize of the sarcasm in tweets to get better performance of sentiment analysis. 2) We propose the right set of features that lead to better accuracy, which is presented in the result analysis part of this paper. 3) Analysis results present that Decision Tree (91.84%) and Random Forest (91.90%) outperform the accuracy compared to Logistic Regression, Gaussian Naive Bayes, and Support Vector Machine for the different features set up. The remainder of this paper is arranged as follows: Section 2 explains the literature review, and Section 3 demonstrates the sarcasm recognition process. Section 4 illustrates our findings, and Section 5 concludes this work. II. LITERATURE REVIEW Many research articles and publications motivated us to work with this topic; a few of them are discussed here in detail. The authors, Sana Parveen, Sachin N. Deshmukh [4], suggested a methodology to recognize the sarcasm on Twitter using Maximum Entropy and Support Vector Machine (SVM). Firstly, they created two datasets from collected 454 | Page www.ijacsa.thesai.org