© 2015, IJARCSSE All Rights Reserved Page | 336
Volume 5, Issue 7, July 2015 ISSN: 2277 128X
International Journal of Advanced Research in
Computer Science and Software Engineering
Research Paper
Available online at: www.ijarcsse.com
A Naïve Bayes Approach for Word Sense Disambiguation
Gurinder Pal Singh Gosal
Department of Computer Science,
Punjabi University, Patiala, Punjab, India
Abstract- The word sense disambiguation (WSD) is the task ofautomatically selecting the correct sense given a context
and it helps in solving many ambiguity problems inherently existing in all natural languages.Statistical Natural
Language Processing (NLP),which is based on probabilistic, stochastic and statistical methods, has been used to solve
many NLP problems.The Naive Bayes algorithm which is one of the supervised learning techniques has worked well
in many classification problems. In the present work, WSD task to disambiguate the senses of different words from the
standard corpora available in the “1998 SENSEVAL Word Sense Disambiguation (WSD) shared task” is performed
by applying Naïve Bayes machine learning technique. It is observed that senses of ambiguous word having lesser
number of part-of-speeches are disambiguated more correctly. Other key observation is that with lesser number of
senses to be disambiguated, the chances of words being disambiguated with correct senses are more.
Keywords— Word sense disambiguation, WSD, POS-filtering, ambiguity, Naïve Bayes, supervised learning
I. INTRODUCTION
The ambiguity in the senses of the words of different languages does exist inherently in all natural languages
used by humans. There are many words in every language which carry more than one meaning for the same word. For
example, the word ―chair‖ has one sense which means a piece of furniture and other sense of it means a person chairing
say some session. So obviously we need some context to select the correct sense given a situation. Automatically
selecting the correct sense given a context is in the core of solving many ambiguity problems. The word sense
disambiguation (WSD) is the task to automatically determine which of the senses of an ambiguous (target) word is
chosen in the specific use of the word by taking into consideration the context of word’s use [1,2].
Having an accurate and reliable word sense disambiguation has been the target of natural language community
since long. The motivation and belief behind performing word sense disambiguation is that many tasks which are
performed under the umbrella of NLP are highly benefitted with properly disambiguated word senses.Statistical NLP, a
special approach of NLP based onthe probabilistic, stochastic and statistical methods, uses machine learning algorithms
to solve many NLP problems. AS a branch ofartificial intelligence, machine learning involves computationallylearning
patterns from given data, and applying to new or unseen data the pattern which were learned earlier. Machine learning is
defined by Tom M.Mitchell as ―A computer program is said to learn from experience E with respect to some class of
tasksT and performance measure P, if its performance at tasks in T,as measured by P, improves withexperience E [3].‖
Learning algorithms can be generally classified into three types: supervised learning, semi-supervised learning
and unsupervised learning. Supervised learning technique is based on the idea of studying the features of positive and
negative examples over a large collection of annotated corpus. Semi-supervised learning uses both labeled data and
unlabeled data for the learning process to reduce the dependence on training data. In the unsupervised learning, decisions
are made on the basis of unlabeled data. The methods of unsupervised learning are mostly built upon clustering
techniques, similarity based functions and distribution statistics. For automatic WSD,supervised learningis one ofthe
most successfulapproaches.
II. RELATED WORK
When the work started on handling of languages with automatic means, the problem of WSD drew the interest
of the researchers at the same time. Therefore, we can say that the WSD task is one of the oldest tasks in computational
linguistics.The problem of WSD was introduced to the community by Weaver in 1949 when he presented it as a basic
task of MachineTranslation (MT). In his well-known Memorandum on Machine Translation, he stressed that by looking
at the context in which the word occurs, this problem of multiple senses of words can be dealt with [4]. The research
came out with the importance of immediate context or adjacent words in doing the disambiguation of the senses. The role
of the domain in WSD task was also analyzed by Weaver and a lot of work followed in this direction after that generating
many specialized dictionaries [5, 6] for sense disambiguation.
There was a view amongst the research community for long that machine translation and word sense
disambiguation are tasks have to be dealt independently. WSD was thought to be a very difficult task to achieve given the
limited set of resources available at that time.In another study the role of syntactic relations in the task of WSD was
discussed by Reifler in his work where he stressed upon the role of grammatical structure [7].