Research Article
A Hybrid Feature Extraction Method for Nepali
COVID-19-Related Tweets Classification
T.B. Shahi ,
1,2
C. Sitaula ,
1,3
and N. Paudel
1
1
Central Department of Computer Science and Information Technology, Tribhuvan University, 44600 Kathmandu, Nepal
2
School of Engineering and Technology, Central Queensland University, Rockhampton 4701, QLD, Australia
3
Department of Electrical and Computer Systems Engineering, Monash University, Clayton 3800, VIC, Australia
Correspondence should be addressed to N. Paudel; nawarajpaudel@cdcsit.edu.np
Received 7 December 2021; Accepted 10 February 2022; Published 9 March 2022
Academic Editor: ippa Reddy G
Copyright © 2022 T.B. Shahi et al. is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
COVID-19isoneofthedeadliestviruses,whichhaskilledmillionsofpeoplearoundtheworldtothisdate.ereasonforpeoples’
deathisnotonlylinkedtoitsinfectionbutalsotopeoples’mentalstatesandsentimentstriggeredbythefearofthevirus.People’s
sentiments,whicharepredominantlyavailableintheformofposts/tweetsonsocialmedia,canbeinterpretedusingtwokindsof
information: syntactical and semantic. Herein, we propose to analyze peoples’ sentiment using both kinds of information
(syntactical and semantic) on the COVID-19-related twitter dataset available in the Nepali language. For this, we, first, use two
widelyusedtextrepresentationmethods:TF-IDFandFastTextandthencombinethemtoachievethehybridfeaturestocapture
the highly discriminating features. Second, we implement nine widely used machine learning classifiers (Logistic Regression,
Support Vector Machine, Naive Bayes, K-Nearest Neighbor, Decision Trees, Random Forest, Extreme Tree classifier, AdaBoost,
and Multilayer Perceptron), based on the three feature representation methods: TF-IDF, FastText, and Hybrid. To evaluate our
methods, we use a publicly available Nepali-COVID-19 tweets dataset, NepCov19Tweets, which consists of Nepali tweets
categorized into three classes (Positive, Negative, and Neutral). e evaluation results on the NepCOV19Tweets show that the
hybrid feature extraction method not only outperforms the other two individual feature extraction methods while using nine
differentmachinelearningalgorithmsbutalsoprovidesexcellentperformancewhencomparedwiththestate-of-the-artmethods.
1.Introduction
Natural language processing (NLP) techniques have been
developed to assess peoples’ sentiments on various topics.
Basically, the sentiment assessment of documents into
Negative, Positive, or Neutral is known as sentiment anal-
ysis. For the sentiment analysis of documents, we basically
deal with sentiment classification, topic modeling, and
opinion mining. Particularly, we obtain textual documents
from various sources, such as social media posts and news
documents. ese documents reflect the peoples’ feelings,
wherebywewouldbeabletoidentifytheirsentimentsusing
machine learning techniques.
Currently, the growth of social media posts, particularly
tweets, because of COVID-19, is incredibly increasing. is
lets us understand people’s mental stress if we process and
analyzethem.Tothisend,thedesignanddevelopmentofan
automated AI tool is essential to understand and deal with
peoples’mentalstresses.erearefewresearchworksofAI
model developed on Nepali COVID-19-related sentiment
analysisintheliterature;therefore,wediscussthesentiment
analysis works carried out in the Nepali language as well as
few other languages, such as English.
Recent works [1–8] on COVID-19 tweets sentiment
analysis in English and other languages [8] underscore the
efficacy of data-driven machine learning approaches, where
they employed several kinds of analysis such as topic
modeling, classification, and clustering. Hence, this urges
the thorough comparison of machine learning methods in
sentiment analysis with the better representation of tweets
for sentiment classification. For this, they used popular
feature extraction methods such as TF-IDF (Term
Hindawi
Computational Intelligence and Neuroscience
Volume 2022, Article ID 5681574, 11 pages
https://doi.org/10.1155/2022/5681574