Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestrict- ed use, distribution, and reproduction in any medium, provided the original work is properly cited. International Journal of Engineering & Technology, 7 (2.6) (2018) 189-192 International Journal of Engineering & Technology Website: www.sciencepubco.com/index.php/IJET Research Paper Kannada word sense disambiguation by finding the overlaps between the concepts B H Manjunatha Kumar 1* , Dr. M. Siddappa 2 , Dr. J. Prakash 3 1 Reaserch scholar, Visvesvaraya Technological University, Belagavi, Karnataka, India 2 Prof. and Head, Dept. of CSE, Sri Siddhartha Institute of Technology, Tumkur, Karnataka, India 3 Prof. and Head, Dept. of ISE, Bangalore Institute of Technology, Bangalore, Karnataka, India *Corresponding author E-mail: bhm.nlp@gmail.com Abstract We propose three approaches for disambiguating the Kannada word based on an adaptation of dictionary-based Lesk’s word sense dis- ambiguation technique. Instead of making use of the regular dictionary as the repository of glosses, we used Indo WordNet lexical database as the source of senses. Here we adopt a current method of measuring semantic relatedness between the concepts of the Kanna- da words taken from Indo WordNet. This measure is dependent on identifying and counting the number of common words present between the glosses of a pair of concepts in accordance with Indo WordNet. Keywords: Indo WordNet, Kannada Word Sense Disambiguation, semantic relatedness, WordNet. 1. Introduction The existence of several meanings in a single word is deeply root- ed in natural language. Each natural language possesses plenty of words having several senses. In the English language, the word bark has multiple meanings like the noise or harsh sound of dogs, the tough external layer of a tree and or boat. Humans are fairly good at deciding the proper meaning. However, this task is diffi- cult for computers. The computational process of finding the proper meaning of a word having several meanings is called as WSD Word Sense Disambiguation. In spite of this arduousness, however, we are fascinated with automating this task and this can play a paramount role in the area of machine translation. The technique proposed, implemented and evaluated in this paper uses Lesk [1] method to measure the semantic relatedness. In WordNet, every concept is described by gloss. Lesk algorithm utilizes the gloss textual content to express the actual concept. In Lesk [1] algorithm the degree of relatedness is calculated by finding the overlaps between the glosses of two concepts, along with concepts which are directly connected to them as per Word- Net. In this paper, we have used Indo-WordNet [2] to extract the lexically expressed concepts. We modify the Lesk algorithm to Indo- WordNet and this technique does not require any training corpus. A good amount of related work papers on utilizing measures of semantic similarity are referred. Budanitsky et al. [3] examined 5 approaches for semantic relatedness measure. They contrasted the efficiency of the five approaches in correcting the spelling mis- takes. They recorded an accuracy of 65%. They also identified a measure based on information content proposed by Jiang is better than Hirst [4], Resnik [7], Lin [6], and Leacock et al. [5]. Reddy.S et al. [8] contrasted and evaluated 6 measures of semantic related- ness in semantic classification and labeling for the Hindi language. They identified that performance of modified Lesk is higher than other five measures. Sinha et al. [9] suggested a graph based un- supervised algorithm for WSD and also produced outputs on da- taset like SENSVAL-2 and SENSEVAL-3 using 6 various measures. Torres et al. examined amalgamation of similarity measures and their experimental outcomes indicate that mixture of various combination measures performs more accurate as com- pared to every individual measure. Even though, these kinds of outcomes cannot be comprehended for the Kannada language. In our paper, we tried to find out the benefits of Lesk technique to measure the semantic similarity for the Kannada language. To the best of our insight, this kind of work isn't accounted previously for Kannada WSD. Sinha et al. [14] conducted context overlapping for Hindi WSD by utilizing extended Lesk algorithm. They used the sense definition of the polysemous Hindi target word taken from Hindi WordNet to perform the overlapping. The sense which got the highest sense score is assigned as an appropriate sense of the ambiguous word of Hindi language. Domain-specific word sense disambiguation was done by Kharpra et al. [12] in three languages like English, Marathi, and Hindi. They used main sens- es of the target word in a particular domain. Singh et al. reported the consequences of doing stemming, elimination of stop words and also the dimension of context window on overlap based Lesk algorithm in WSD of Hindi language and they recorded 9.24% of improvement in accuracy. R. Sawhney et al. [18] proposed a mod- ified WSD algorithm for Hindi language. They used Lesk tech- nique to disambiguate the ambiguous word. A. R. Pal et al. [20] presented a knowledge based method for WSD of Bengali lan- guage. They used Bengali wordnet as the knowledge base and achieved an accuracy of 75%. A survey on word sense disambig- uation [19] is referred to understand the merits and demerits of supervised and unsupervised approaches for WSD. In this paper, first, we describe the original Lesk technique for WSD, followed by our WSD algorithm. Next, we present Dataset used in the experiment, followed by the experimental results and discussion on the same. In the end, we summarize with a discus- sion on the outcomes, and also on recommendations for future enhancements.