Research Article
Sentence Classification Using N-Grams in Urdu Language Text
Malik Daler Ali Awan ,
1
Sikandar Ali ,
2
Ali Samad ,
1
Nadeem Iqbal ,
3
Malik Muhammad Saad Missen ,
1
and Niamat Ullah
4
1
Department of Information Technology, Faculty of Computing, e Islamia University of Bahawalpur,
63100 Bahawalpur, Pakistan
2
Department of Information Technology, e University of Haripur, 22621 Haripur, Khyber Pakhtunkhwa, Pakistan
3
Muhammad Nawaz Shareef University of Agriculture, Multan 61000, Pakistan
4
Department of Computer Science, University of Buner, 19290 Sawarai Buner, Khyber Pakhtunkhwa, Pakistan
CorrespondenceshouldbeaddressedtoSikandarAli;sikandar@cup.edu.cn
Received 17 April 2021; Revised 27 May 2021; Accepted 7 November 2021; Published 22 November 2021
AcademicEditor:Wei-ChuenYau
Copyright©2021MalikDalerAliAwanetal.isisanopenaccessarticledistributedundertheCreativeCommonsAttribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
eusageoflocallanguagesisbeingcommoninsocialmediaandnewschannels.epeoplesharetheworthyinsightsabout
various topics related to their lives in different languages. A bulk of text in various local languages exists on the Internet that
containsinvaluableinformation.eanalysisofsuchtypeofstuff(locallanguage’stext)willcertainlyhelpimproveanumberof
Natural Language Processing (NLP) tasks. e information extracted from local languages can be used to develop various
applicationstoaddnewmilestoneinthefieldofNLP.Inthispaper,wepresentedanappliedresearchtask,“multiclasssentence
classificationforUrdulanguagetextatsentencelevelexistingonthesocialnetworks,i.e.,Twitter,Facebook,andnewschannelsby
usingN-gramsfeatures.”Ourdatasetconsistsofmorethan1,00000instancesoftwelve(12)differenttypesoftopics.Afamous
machinelearningclassifierRandomForestisusedtoclassifythesentences.Itshowed80.15%,76.88%,and64.41%accuracyfor
unigram, bigram, and trigram features, respectively.
1. Introduction
e text is still dominant and prominent way of commu-
nication instead of only pictures, emoji, sounds, and ani-
mations. e innovative environment of communication,
thereal-timeavailabilityoftheInternet,andtheunrestricted
communicationmodeofsocialnetworksattractedbillionsof
peoplearoundtheworld.Peopleshareinsightsaboutvarious
topics,opinions,views,ideas,andeventshappeningaround
themonsocialnetworksindifferentlanguages.Socialmedia
andnewschannels:suchcommunicationplatformscreated
spaceforlocallanguagestoshareinformation.Googleinput
tool (https://www.google.com/inputtools/) provides the
language transliteration support to 88 different languages.
edevelopmentofmanylocallanguagessupportingtoolsis
anotherfactorthatboostedtheusageoflocallanguageson
socialmediaandnewschannels.Obviously,peoplepreferto
communicateinlocallanguagesinsteadofgloballanguages
becauseofeasinessinconveyingmessages.Itisalsocausing
to generate heterogeneous data on Internet.
Sifting worthy insights from an immense amount of
heterogeneous text of multiple local languages existing on
socialmediaisoneoftheinterestingandchallengingtasksof
Natural Language Processing (NLP). Local language pro-
cessingcertainlyprovidestheinvaluableinsightstodevelop
NLP applications. ese applications can respond in
emergencies, outbreaks, and natural disasters, i.e., rain,
flood, and earthquake [1]. e interesting feature like real-
time interaction of social media has facilitated millions of
people to share their intent, appreciation, or criticism [2],
i.e., enjoying discount offer by selling brands or criticizing
the quality of the product. Extracting and classifying such
information are valuable to improve the quality of the
product.eimplementationofsmartcitiespossessesalot
ofchallenges,suchasdecisionmaking,eventmanagement,
communication,andinformationretrieval.Extractinguseful
Hindawi
Scientific Programming
Volume 2021, Article ID 1296076, 11 pages
https://doi.org/10.1155/2021/1296076