Optimized Web Search Results Through Additional Retrieval Lists Inferred Using WordNet Similarity Measure Saravanakumar K 1 , Aswani Kumar Cherukuri 2 School of Information Technology and Engineering, Vellore Institute of Technology University, Vellore, India 1 ksaravanakumar@vit.ac.in 2 cherukuri@acm.org Abstract— Search engines have become mandatory part in the usability of information available through Internet. They provide direct support in the growth of the World Wide Web. Today their concern is to give more importance to the precision in the top results suggested to reduce the iterative search of a concept by any user. Our main objective is to improve the efficiency of search results suggested by the search engine in response to a query. The objective is approached by constructing alternate queries for the main query given by the user. It involves the selection of contextually most similar alternate queries through the method proposed here. The coalition of results produced by the main query and the alternate queries could improve the precision in the top pages. We evaluated the proposed method and observed that the proposed method performed well in projection of some of the important links of the search results into the top few pages. Also, it is observed that the precision improved considerably. Keywords— Information Retrieval; Semantic Relatedness; Alternate Query Formation; Re-ranking Search Results; Wordnet Similarity. I. INTRODUCTION The most important purpose of the internet today has become finding or gathering the information that we need. In most of the search engines, the search is based on the keywords present in our query that we submit [13]. Search engines take those keywords and search the pages for the presence of those words and produce the result as set of web pages. As the result set contains millions of suggestions, no one can explore all suggested links to end up in what they need. Here comes the role of ranking of suggestions by the search engines. The ranking of the pages are mostly done by popularity or by the number of hits the page has, but not on the relevancy of the information user wants to retrieve, in most cases. Some popular methods which would be treated as good platform for Information Retrieval and related domains are discussed by Baesa-Yates in their work [15]. While we are processing a given query, the semantic meaning or synonyms of keywords or sometimes the colloquially used synonyms of a given word must be considered. The use of such knowledge or sources would improve the search result considerably. This must be done in addition to the conventional methods. Failing which may lead to irrelevant suggestions or ranking of the pages. Ontology could be used as a component of knowledge source in the process of understanding the query [14]. The re-ranking problem is given importance in recent times to attain high precision at the very top ranks. In this process, the components like Ontology, thesaurus and dictionaries are used to better understand the query [14, 16]. For a query, “popular tourist spots of India”, any search process would explore and retrieve pages matching the keywords popular, tourist, spots and India in most cases. Conceptually the query could also mean “famous tourist spots of India”, “Top tourist destinations of India”, or “Famous travel destinations of India”. These queries show that even though most of the words are direct synonyms of the keywords present in the given query, still some of them are not direct synonyms. Even the words might not be part of Thesauruses sometime. Though these are matching well with the query given in the question, most of the search engines do not consider the pages which include these alternate key words. Hence it is always suggested to take into consideration the semantic meaning and the synonyms and also the colloquially used relevant words for searching. This method of finding alternate queries using various resources sometimes considered in literature as paraphrasing which is applied in variety of natural language processing applications like question answering system, text categorization, text summarization, machine translation, etc [8]. The approach we are proposing here contains the following steps: we pre-process the query in order to remove the stop- words. Only the keywords which are the candidate words in deciding the semantics of the sentences will be taken into consideration [5]. Then, we construct a vocabulary for those candidate words using their synonymies using WordNet [6]. We have found that other semantic and lexical meaning finding ontology based dictionaries are providing good knowledge but they sometimes lack efficiency and precision. WordNet is designed to be used as dictionary or thesaurus and to support text analysis. Hence, we used WordNet as lexical source for solving our problem. Finally, we construct the alternate queries with the use of candidate words of the given query, analyze the results of those queries, and accordingly re-rank the final result of the given query. The rest of this paper is organized as follows; section 2 discusses about the related work, section 3 discusses about the 978-1-4799-4674-7/14/$31.00 ©2014 IEEE