www.ijraset.com Volume 5 Issue VI, June 2017
IC Value: 45.98 ISSN: 2321-9653
International Journal for Research in Applied Science & Engineering
Technology (IJRASET)
© IJRASET: All Rights are Reserved
1476
Comparative Analysis of Supervised Approaches
for Word Sense Disambiguation Using Text
Similarity
Subha Mahajan
1
, Rakesh Kumar
2
, Vibhakar Mansotra
3
Department of Computer Science and IT, University of Jammu, Jammu
Abstract: The words that are often being correspond to two or more meanings rather than to a single meaning results in
semantically-ambiguous words . Measuring the similarity between words, sentences , paragraphs is an important part in
information retrieval and word sense disambiguation tasks. One of the biggest challenges in Natural Language Processing is for
the system to encompass in what sense a specific word is being used .This paper describes the analysis of text in order to a
certain first the similarity in case that exists. Second the effort has been made to resolve the ambiguity in the text. The paper
presents the comparison of machine learning approaches in the text similarity analysis. The Naive bayes approach was observed
to outperform other approaches including SVM , Max Entropy , Tree , Random Forest and Bagging . Keywords: Text Similarity,
Word Sense Disambiguation, Approaches, SENSEVAL, Supervised machine learning algorithms.
I. INTRODUCTION
In the present world people are mainly depended on the web for searching any kind of content. Search engines have done
remarkable job of information retrieval. However, but still the goal of retrieving relevant information is a far cry . When the person
is searching information on web he /she does not bother about the ambiguity of a word that whether the content they are retrieving
is relevant to them or not. It gets difficult for the user to get relevant information in any language when the word or phrases have
more than one interpretation. One step towards realizing this goal is the detection of similarity of texts i.e. determining how close is
the meaning of two given texts are . The idea is based on text similarity [1] detection which plays an important role in text related
search in tasks such as information retrieval, word sense disambiguation (WSD) , machine translation , Information Extraction and
Speech Recognition and others. For example , the phrase “The second hand of the clock is not working “, the word second means a
basic unit of time , while in phrase “Ram came second in the class” , the word “second” refers to the position in series .The problem
can be reduced up to an extent by the concept of disambiguation of a word. When a word has multiple meaning then it is probably
considered an ambiguity. Hence, Word sense disambiguation (WSD) is termed as an open problem of natural language processing
with a process of identifying a correct sense of a word in a given context. WSD plays important role in improving the quality of
information so as to comprehend in what sense a specific word is being used. WSD was first formulated as a distinct ciphering task
during early days of machine translation in late 1940s, making one of the oldest problem of computational semantics. The problem
was continued as a challenging task until there was a availability of resources. In 1980 there was prodigious development in the area
of WSD research when a large scale lexical resources and corpora came into existence. In 1990s , NLP provided three major
developments for WSD :online dictionary WordNet which is organised as a word senses called synsets and used as an online sense
inventory ,statistical methodologies which are used as sense classification problems and SENSEVAL which was proposed in 1997
by Resnik and Yarowsky. Further other SENSEVAL evaluation exercises have also been introduced so that researchers can share
and upgrade their views in this research area.
II. LITERATURE REVIEW
When the work started on handling of the different languages with automatic means, the problem of ambiguity drew the interest of
the researchers at the same time. Work on ambiguity in sense annotation has often focused on techniques to reduce ambiguity in
sense inventory .Therefore, we can say that the WSD task is one of the oldest tasks for solving lexical ambiguity . Many of the
researchers[2]Mukti Desai and Mrs. Kiran Bhowmick (2013) have surveyed on solving the ambiguity by applying different
approaches and techniques of WSD. [3] A. R. Rezapour et al. (2011)have used a K-Nearest Neighbor algorithm of supervised
learning method for WSD. The author have done feature extraction which includes the set of words that have occurred frequently in
the text and the set of words surrounding the ambiguous word, so as to improve the classification accuracy. [4]Arti Mishra and
Meenakshi Pathak (2014) have analyzed the web queries in English language to study the effect on the performance of various