Optimized Web Search Results Through
Additional Retrieval Lists Inferred Using
WordNet Similarity Measure
Saravanakumar K
1
, Aswani Kumar Cherukuri
2
School of Information Technology and Engineering,
Vellore Institute of Technology University,
Vellore, India
1
ksaravanakumar@vit.ac.in
2
cherukuri@acm.org
Abstract— Search engines have become mandatory part in the
usability of information available through Internet. They provide
direct support in the growth of the World Wide Web. Today their
concern is to give more importance to the precision in the top
results suggested to reduce the iterative search of a concept by any
user. Our main objective is to improve the efficiency of search
results suggested by the search engine in response to a query. The
objective is approached by constructing alternate queries for the
main query given by the user. It involves the selection of
contextually most similar alternate queries through the method
proposed here. The coalition of results produced by the main
query and the alternate queries could improve the precision in the
top pages. We evaluated the proposed method and observed that
the proposed method performed well in projection of some of the
important links of the search results into the top few pages. Also, it
is observed that the precision improved considerably.
Keywords— Information Retrieval; Semantic Relatedness;
Alternate Query Formation; Re-ranking Search Results; Wordnet
Similarity.
I. INTRODUCTION
The most important purpose of the internet today has become
finding or gathering the information that we need. In most of
the search engines, the search is based on the keywords present
in our query that we submit [13]. Search engines take those
keywords and search the pages for the presence of those words
and produce the result as set of web pages. As the result set
contains millions of suggestions, no one can explore all
suggested links to end up in what they need. Here comes the
role of ranking of suggestions by the search engines. The
ranking of the pages are mostly done by popularity or by the
number of hits the page has, but not on the relevancy of the
information user wants to retrieve, in most cases. Some popular
methods which would be treated as good platform for
Information Retrieval and related domains are discussed by
Baesa-Yates in their work [15]. While we are processing a
given query, the semantic meaning or synonyms of keywords or
sometimes the colloquially used synonyms of a given word
must be considered. The use of such knowledge or sources
would improve the search result considerably. This must be
done in addition to the conventional methods. Failing which
may lead to irrelevant suggestions or ranking of the pages.
Ontology could be used as a component of knowledge source in
the process of understanding the query [14]. The re-ranking
problem is given importance in recent times to attain high
precision at the very top ranks. In this process, the components
like Ontology, thesaurus and dictionaries are used to better
understand the query [14, 16].
For a query, “popular tourist spots of India”, any search process
would explore and retrieve pages matching the keywords
popular, tourist, spots and India in most cases. Conceptually the
query could also mean “famous tourist spots of India”, “Top
tourist destinations of India”, or “Famous travel destinations of
India”. These queries show that even though most of the words
are direct synonyms of the keywords present in the given query,
still some of them are not direct synonyms. Even the words
might not be part of Thesauruses sometime. Though these are
matching well with the query given in the question, most of the
search engines do not consider the pages which include these
alternate key words. Hence it is always suggested to take into
consideration the semantic meaning and the synonyms and also
the colloquially used relevant words for searching. This method
of finding alternate queries using various resources sometimes
considered in literature as paraphrasing which is applied in
variety of natural language processing applications like
question answering system, text categorization, text
summarization, machine translation, etc [8].
The approach we are proposing here contains the following
steps: we pre-process the query in order to remove the stop-
words. Only the keywords which are the candidate words in
deciding the semantics of the sentences will be taken into
consideration [5]. Then, we construct a vocabulary for those
candidate words using their synonymies using WordNet [6].
We have found that other semantic and lexical meaning finding
ontology based dictionaries are providing good knowledge but
they sometimes lack efficiency and precision. WordNet is
designed to be used as dictionary or thesaurus and to support
text analysis. Hence, we used WordNet as lexical source for
solving our problem. Finally, we construct the alternate queries
with the use of candidate words of the given query, analyze the
results of those queries, and accordingly re-rank the final result
of the given query.
The rest of this paper is organized as follows; section 2
discusses about the related work, section 3 discusses about the
978-1-4799-4674-7/14/$31.00 ©2014 IEEE