www.ijraset.com Volume 5 Issue VI, June 2017 IC Value: 45.98 ISSN: 2321-9653 International Journal for Research in Applied Science & Engineering Technology (IJRASET) © IJRASET: All Rights are Reserved 1476 Comparative Analysis of Supervised Approaches for Word Sense Disambiguation Using Text Similarity Subha Mahajan 1 , Rakesh Kumar 2 , Vibhakar Mansotra 3 Department of Computer Science and IT, University of Jammu, Jammu Abstract: The words that are often being correspond to two or more meanings rather than to a single meaning results in semantically-ambiguous words . Measuring the similarity between words, sentences , paragraphs is an important part in information retrieval and word sense disambiguation tasks. One of the biggest challenges in Natural Language Processing is for the system to encompass in what sense a specific word is being used .This paper describes the analysis of text in order to a certain first the similarity in case that exists. Second the effort has been made to resolve the ambiguity in the text. The paper presents the comparison of machine learning approaches in the text similarity analysis. The Naive bayes approach was observed to outperform other approaches including SVM , Max Entropy , Tree , Random Forest and Bagging . Keywords: Text Similarity, Word Sense Disambiguation, Approaches, SENSEVAL, Supervised machine learning algorithms. I. INTRODUCTION In the present world people are mainly depended on the web for searching any kind of content. Search engines have done remarkable job of information retrieval. However, but still the goal of retrieving relevant information is a far cry . When the person is searching information on web he /she does not bother about the ambiguity of a word that whether the content they are retrieving is relevant to them or not. It gets difficult for the user to get relevant information in any language when the word or phrases have more than one interpretation. One step towards realizing this goal is the detection of similarity of texts i.e. determining how close is the meaning of two given texts are . The idea is based on text similarity [1] detection which plays an important role in text related search in tasks such as information retrieval, word sense disambiguation (WSD) , machine translation , Information Extraction and Speech Recognition and others. For example , the phrase “The second hand of the clock is not working “, the word second means a basic unit of time , while in phrase “Ram came second in the class” , the word “second” refers to the position in series .The problem can be reduced up to an extent by the concept of disambiguation of a word. When a word has multiple meaning then it is probably considered an ambiguity. Hence, Word sense disambiguation (WSD) is termed as an open problem of natural language processing with a process of identifying a correct sense of a word in a given context. WSD plays important role in improving the quality of information so as to comprehend in what sense a specific word is being used. WSD was first formulated as a distinct ciphering task during early days of machine translation in late 1940s, making one of the oldest problem of computational semantics. The problem was continued as a challenging task until there was a availability of resources. In 1980 there was prodigious development in the area of WSD research when a large scale lexical resources and corpora came into existence. In 1990s , NLP provided three major developments for WSD :online dictionary WordNet which is organised as a word senses called synsets and used as an online sense inventory ,statistical methodologies which are used as sense classification problems and SENSEVAL which was proposed in 1997 by Resnik and Yarowsky. Further other SENSEVAL evaluation exercises have also been introduced so that researchers can share and upgrade their views in this research area. II. LITERATURE REVIEW When the work started on handling of the different languages with automatic means, the problem of ambiguity drew the interest of the researchers at the same time. Work on ambiguity in sense annotation has often focused on techniques to reduce ambiguity in sense inventory .Therefore, we can say that the WSD task is one of the oldest tasks for solving lexical ambiguity . Many of the researchers[2]Mukti Desai and Mrs. Kiran Bhowmick (2013) have surveyed on solving the ambiguity by applying different approaches and techniques of WSD. [3] A. R. Rezapour et al. (2011)have used a K-Nearest Neighbor algorithm of supervised learning method for WSD. The author have done feature extraction which includes the set of words that have occurred frequently in the text and the set of words surrounding the ambiguous word, so as to improve the classification accuracy. [4]Arti Mishra and Meenakshi Pathak (2014) have analyzed the web queries in English language to study the effect on the performance of various