International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 07 Issue: 04 | Apr 2020 www.irjet.net p-ISSN: 2395-0072 © 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 5533 AUTOMATIC EMOJI LEXICON CONSTRUCTION FOR SENTIMENT ANALYSIS USING WORD KNOWLEDGE P.Akshaya 1 , Mrs.K.Krishnakumari 2 1 M.E Student, Department of CSE, A.V.C College of Engineering, Tamil Nadu, India 2 Associate Professor, Department of CSE, A.V.C College of Engineering, Tamil Nadu, India ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - In the field of sentiment analysis, emoji has received only little attention. Now-a-days, the users also share or express their opinions through emoji’s and without any text. Most of the existing approaches use manually labeled dataset and train their classification system. It requires more training data, which is an expensive process. So, here the system proposed is lexicon-based sentiment analysis for word-level using lexicon graph with databases like WordNet and WNRH. In this approach, relatedness score helps to identify fine grained sentiment analysis. Thus, the emoji sentiment scores are calculated using co-occurrence frequency between emoji and sentiment words. Thus, the experimental results show this analysis provides unsupervised way of finding relatedness score for both text and emoji. Key Words: Sentimental Analysis, graph-based approach, lexical-based, Emoji, WordNet 1. INTRODUCTION Sentiment Analysis (SA) is the process of extracting sentiment such as positive, negative and neutral from a given dataset, which may contain the user reviews about the product or services collected via the social media like twitter, Facebook, blogs, etc. SA has two sub-tasks, which are data acquisition and data pre-processing. The data acquisition is the collection of reviews or feedback forms from the websites or social media networks. The data pre-processing step includes the tokenization, stemming, stop word removal, etc. SA helps to save the large amount of processing time of unstructured data instead of manual data processing. Now-a-days, the real time sentiment analysis is done, to know about the user real emotion in the political issues, product, services and sales marketing. The sentiments can be extracted by means of different levels such as word, phrase, sentence, document and aspect based from the source. 1.1 Lexical Sentiment Analysis The LSA method utilize the Natural Language Processing and Machine Learning techniques and the automated tools to extract the emotions such as positive, negative, neutral in the textual data from user feedbacks and reviews in the web platforms. In recent approaches, the automatic creation of sentiment corpora techniques used. The LSA can be classified as Corpus-based approach, which is the data-driven method and lexicon-based approach, which is the knowledge-driven method. The corpus-based approach is used to analyze the large text corpora e.g., ISEAR. It is used to identify the probability of occurrence of textual features such as lexical forms, POS tags, n-grams or phrasal patterns and it enables sentiment predictions for new texts. But it is generally data hungry and it requires a considerable amount of manual effort, to produce a relevant sense-annotated corpus. The lexicon-based approach gets the sentiment clues from the readily available sentiment lexicon. The sentiment lexicon is nothing but contains the list of words or phrases which expresses its sentiments such as positive and negative sentiments. For example, the sentiment lexicon is available for 81 languages in Kaggle website and Senti-WordNet, WNA, etc. Due to the huge manual effort in the corpus-based approach, the lexicon- based approach is widely used for research purpose lately. Generally, the sentiment extraction can be processed in different granular levels. They are by processing of a word, a phrase, a sentence, a document and aspect. The Phrase sentiments deduced from word-level sentiments whereas the sentence-level lexical sentiment extraction based on either the word-level or phrase-level LSA. The word-level LSA is the fine-grained approach, where each word is associated with sentiment categories. And it allows LSA to utilize the text at higher granularities. The words can be processed by means of either separately i.e., standalone method or by considering their textual surroundings of the context. Some other processes are by considering the domain or