International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 5533
AUTOMATIC EMOJI LEXICON CONSTRUCTION FOR SENTIMENT
ANALYSIS USING WORD KNOWLEDGE
P.Akshaya
1
, Mrs.K.Krishnakumari
2
1
M.E Student, Department of CSE, A.V.C College of Engineering, Tamil Nadu, India
2
Associate Professor, Department of CSE, A.V.C College of Engineering, Tamil Nadu, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - In the field of sentiment analysis, emoji has received only little attention. Now-a-days, the users also share or
express their opinions through emoji’s and without any text. Most of the existing approaches use manually labeled dataset and
train their classification system. It requires more training data, which is an expensive process. So, here the system proposed is
lexicon-based sentiment analysis for word-level using lexicon graph with databases like WordNet and WNRH. In this
approach, relatedness score helps to identify fine grained sentiment analysis. Thus, the emoji sentiment scores are calculated
using co-occurrence frequency between emoji and sentiment words. Thus, the experimental results show this analysis provides
unsupervised way of finding relatedness score for both text and emoji.
Key Words: Sentimental Analysis, graph-based approach, lexical-based, Emoji, WordNet
1. INTRODUCTION
Sentiment Analysis (SA) is the process of extracting sentiment such as positive, negative and neutral from a
given dataset, which may contain the user reviews about the product or services collected via the social media
like twitter, Facebook, blogs, etc. SA has two sub-tasks, which are data acquisition and data pre-processing. The
data acquisition is the collection of reviews or feedback forms from the websites or social media networks. The
data pre-processing step includes the tokenization, stemming, stop word removal, etc. SA helps to save the
large amount of processing time of unstructured data instead of manual data processing. Now-a-days, the real
time sentiment analysis is done, to know about the user real emotion in the political issues, product, services
and sales marketing. The sentiments can be extracted by means of different levels such as word, phrase,
sentence, document and aspect based from the source.
1.1 Lexical Sentiment Analysis
The LSA method utilize the Natural Language Processing and Machine Learning techniques and the automated
tools to extract the emotions such as positive, negative, neutral in the textual data from user feedbacks and
reviews in the web platforms. In recent approaches, the automatic creation of sentiment corpora techniques
used. The LSA can be classified as Corpus-based approach, which is the data-driven method and lexicon-based
approach, which is the knowledge-driven method. The corpus-based approach is used to analyze the large text
corpora e.g., ISEAR. It is used to identify the probability of occurrence of textual features such as lexical forms,
POS tags, n-grams or phrasal patterns and it enables sentiment predictions for new texts. But it is generally
data hungry and it requires a considerable amount of manual effort, to produce a relevant sense-annotated
corpus. The lexicon-based approach gets the sentiment clues from the readily available sentiment lexicon. The
sentiment lexicon is nothing but contains the list of words or phrases which expresses its sentiments such as
positive and negative sentiments. For example, the sentiment lexicon is available for 81 languages in Kaggle
website and Senti-WordNet, WNA, etc. Due to the huge manual effort in the corpus-based approach, the lexicon-
based approach is widely used for research purpose lately. Generally, the sentiment extraction can be
processed in different granular levels. They are by processing of a word, a phrase, a sentence, a document and
aspect. The Phrase sentiments deduced from word-level sentiments whereas the sentence-level lexical
sentiment extraction based on either the word-level or phrase-level LSA. The word-level LSA is the fine-grained
approach, where each word is associated with sentiment categories. And it allows LSA to utilize the text at
higher granularities. The words can be processed by means of either separately i.e., standalone method or by
considering their textual surroundings of the context. Some other processes are by considering the domain or