International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395 -0056
Volume: 04 Issue: 01 | Jan -2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 156
SEMANTIC SIMILARITY BETWEEN SENTENCES
PANTULKAR SRAVANTHI
1
, DR. B. SRINIVASU
2
1
M.tech Scholar Dept. of Computer Science and Engineering, Stanley College of Engineering and Technology for
Women, Telangana- Hyderabad, India
2
Associate Professor - Dept. of Computer Science and Engineering, Stanley College of Engineering and Technology
for Women, Telangana- Hyderabad, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - The task of measuring sentence similarity is
defined as determining how similar the meanings of two
sentences are. Computing sentence similarity is not a trivial
task, due to the variability of natural language - expressions.
Measuring semantic similarity of sentences is closely related to
semantic similarity between words. It makes a relationship
between a word and the sentence through their meanings. The
intention is to enhance the concepts of semantics over the
syntactic measures that are able to categorize the pair of
sentences effectively. Semantic similarity plays a vital role in
Natural language processing, Informational Retrieval, Text
Mining, Q & A systems, text-related research and application
area.
Traditional similarity measures are based on the syntactic
features and other path based measures. In this project, we
evaluated and tested three different semantic similarity
approaches like cosine similarity, path based approach (wu –
palmer and shortest path based), and feature based approach.
Our proposed approaches exploits preprocessing of pair of
sentences which identifies the bag of words and then applying
the similarity measures like cosine similarity, path based
similarity measures. In our approach the main contributions
are comparison of existing similarity measures and feature
based measure based on Wordnet. In feature based approach
we perform the tagging and lemmatization and generates the
similarity score based on the nouns and verbs. We evaluate our
project output by comparing the existing measures based on
different thresholds and comparison between three
approaches. Finally we conclude that feature based measure
generates better semantic score.
Key Words: WordNet, Path based similarity, Features
based Similarity, Word Overlap, Cosine similarity, Word
order similarity, Semantic similarity.
1. INTRODUCTION
Sentence similarity measures are becoming increasingly
more important in text-related research and other
application areas. Some dictionary-based measures to
capture the semantic similarity between two sentences,
which is heavily based on the WordNet semantic dictionary
[1].Sentence similarity is one of the core elements of Natural
Language Processing (NLP) tasks such as Recognizing
Textual Entailment (RTE)[2] and Paraphrase Recognition[3].
Given two sentences, the task of measuring sentence
similarity is defined as determining how similar the meaning
of two sentences is. The higher the score, the more similar
the meaning of the two sentences. WordNet and similarity
measures play an important role in sentence level similarity
than document level[4].
1.1 Problem Description
Determining the similarity between sentences is one of the
crucial tasks in natural language processing (NLP). To
estimate the accurate score generated from syntactic
similarity to semantic similarity. Computing sentence
similarity is not a trivial task, due to the variability of natural
language expressions. Measuring semantic similarity of
sentences is closely related to semantic similarity between
words. In information retrieval, similarity measure is used to
assign a ranking score between a query and texts in a corpus
[5].
1.2 Basics and background knowledge
In the background we have defined the basic definitions and
different strategies that can be used.
1.2.1 WordNet
WordNet is the product of a research project at Princeton
University. It is a large lexical database of English. In
WordNet nouns, verbs, adverbs and adjectives are organized
by a variety of semantic relations into synonym sets
(synsets), which represent one concept. Examples of
relations are synonymy, autonomy, hyponymy, member,
similar, domain and cause and so on. In this paper, we are
only concerned about the similarity measure based on nouns
and synonym relation of WordNet.
1.2.2 Semantic Similarity
The semantic similarity sometimes called as topological
similarity. Semantic similarity is calculated at document
level, term level and sentence level. The document and
sentence level is calculated based on the terms which