Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestrict-
ed use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology, 7 (2.6) (2018) 189-192
International Journal of Engineering & Technology
Website: www.sciencepubco.com/index.php/IJET
Research Paper
Kannada word sense disambiguation by finding the overlaps
between the concepts
B H Manjunatha Kumar
1*
, Dr. M. Siddappa
2
, Dr. J. Prakash
3
1
Reaserch scholar, Visvesvaraya Technological University, Belagavi, Karnataka, India
2
Prof. and Head, Dept. of CSE, Sri Siddhartha Institute of Technology, Tumkur, Karnataka, India
3
Prof. and Head, Dept. of ISE, Bangalore Institute of Technology, Bangalore, Karnataka, India
*Corresponding author E-mail: bhm.nlp@gmail.com
Abstract
We propose three approaches for disambiguating the Kannada word based on an adaptation of dictionary-based Lesk’s word sense dis-
ambiguation technique. Instead of making use of the regular dictionary as the repository of glosses, we used Indo – WordNet lexical
database as the source of senses. Here we adopt a current method of measuring semantic relatedness between the concepts of the Kanna-
da words taken from Indo – WordNet. This measure is dependent on identifying and counting the number of common words present
between the glosses of a pair of concepts in accordance with Indo – WordNet.
Keywords: Indo – WordNet, Kannada Word Sense Disambiguation, semantic relatedness, WordNet.
1. Introduction
The existence of several meanings in a single word is deeply root-
ed in natural language. Each natural language possesses plenty of
words having several senses. In the English language, the word
bark has multiple meanings like the noise or harsh sound of dogs,
the tough external layer of a tree and or boat. Humans are fairly
good at deciding the proper meaning. However, this task is diffi-
cult for computers. The computational process of finding the
proper meaning of a word having several meanings is called as
WSD – Word Sense Disambiguation. In spite of this arduousness,
however, we are fascinated with automating this task and this can
play a paramount role in the area of machine translation. The
technique proposed, implemented and evaluated in this paper uses
Lesk [1] method to measure the semantic relatedness. In WordNet,
every concept is described by gloss. Lesk algorithm utilizes the
gloss textual content to express the actual concept. In Lesk [1]
algorithm the degree of relatedness is calculated by finding the
overlaps between the glosses of two concepts, along
with concepts which are directly connected to them as per Word-
Net. In this paper, we have used Indo-WordNet [2] to extract the
lexically expressed concepts. We modify the Lesk algorithm to
Indo- WordNet and this technique does not require any training
corpus.
A good amount of related work papers on utilizing measures of
semantic similarity are referred. Budanitsky et al. [3] examined 5
approaches for semantic relatedness measure. They contrasted the
efficiency of the five approaches in correcting the spelling mis-
takes. They recorded an accuracy of 65%. They also identified a
measure based on information content proposed by Jiang is better
than Hirst [4], Resnik [7], Lin [6], and Leacock et al. [5]. Reddy.S
et al. [8] contrasted and evaluated 6 measures of semantic related-
ness in semantic classification and labeling for the Hindi language.
They identified that performance of modified Lesk is higher than
other five measures. Sinha et al. [9] suggested a graph based un-
supervised algorithm for WSD and also produced outputs on da-
taset like SENSVAL-2 and SENSEVAL-3 using 6 various
measures. Torres et al. examined amalgamation of similarity
measures and their experimental outcomes indicate that mixture of
various combination measures performs more accurate as com-
pared to every individual measure. Even though, these kinds of
outcomes cannot be comprehended for the Kannada language. In
our paper, we tried to find out the benefits of Lesk technique to
measure the semantic similarity for the Kannada language. To the
best of our insight, this kind of work isn't accounted previously for
Kannada WSD. Sinha et al. [14] conducted context overlapping
for Hindi WSD by utilizing extended Lesk algorithm. They used
the sense definition of the polysemous Hindi target word taken
from Hindi WordNet to perform the overlapping. The sense which
got the highest sense score is assigned as an appropriate sense of
the ambiguous word of Hindi language. Domain-specific word
sense disambiguation was done by Kharpra et al. [12] in three
languages like English, Marathi, and Hindi. They used main sens-
es of the target word in a particular domain. Singh et al. reported
the consequences of doing stemming, elimination of stop words
and also the dimension of context window on overlap based Lesk
algorithm in WSD of Hindi language and they recorded 9.24% of
improvement in accuracy. R. Sawhney et al. [18] proposed a mod-
ified WSD algorithm for Hindi language. They used Lesk tech-
nique to disambiguate the ambiguous word. A. R. Pal et al. [20]
presented a knowledge based method for WSD of Bengali lan-
guage. They used Bengali wordnet as the knowledge base and
achieved an accuracy of 75%. A survey on word sense disambig-
uation [19] is referred to understand the merits and demerits of
supervised and unsupervised approaches for WSD.
In this paper, first, we describe the original Lesk technique for
WSD, followed by our WSD algorithm. Next, we present Dataset
used in the experiment, followed by the experimental results and
discussion on the same. In the end, we summarize with a discus-
sion on the outcomes, and also on recommendations for future
enhancements.