Extended Semantic based Boolean Information Retrieval Algorithm for User-driven Query Vachhani Upama ME Student, Department of Computer Engineering, GEC, Sector-28, Gandhinagar, India. S. M. Shah Associate Professor, Department of Computer Engineering, GEC, Sector-28, Gandhinagar, India. Abstract - Information Retrieval (IR) is essentially a matter of deciding which documents within a large collection satisfies a user’s information need. Those documents are called relevant documents and the documents that are not of the topic specified by the user are said to be non-relevant. An existing SBIR algorithm uses lexical database, WordNet to find synonyms of single-word query term considering that the absence of the given term in a document does not necessarily mean that the document is not a relevant.In this paper, a new algorithm is proposed which works with compound terms and uses modified Porter Stemming Algorithm to solve some stemming errors found in Porter Stemmer Algorithm proposed by M. F. Porter. This will improve the recall as more relevant documents will be retrieved. We propose to involve a user in the search process through interactive feedback for word senses. This will further improve recall by retrieving more user relevant results. Keywords - Information Retrieval, WordNet, Porter Stemming Algorithm, Boolean Information Retrieval. I. INTRODUCTION The plentiful information stored in online databases can be highly advantageous for both people and automated computer systems that seek information, if it can be retrieved efficiently. Information Retrieval is a procedure of finding the documents in a corpus based on a specific query.The main idea is to locate documents that contain the terms that the users specify in their queries. A. Boolean Retrieval Model Most of the classicalinformation retrieval models retrieve the document based on lexicographic term matching only.Boolean retrieval model, based on set-theory, is the very first retrieval model proposed three decades ago. It is kind of exact match model. If the exact term exists in document, then only the document is retrieved otherwise not. In professional search environments such as legalsearch or patent search, users are expecting many retrievalresults, i.e., there is ordinarily an emphasis on recall.Recent surveys have also checked thatprofessional searchers keep on having a solid inclination forBoolean queries because of the precise nature of Boolean model.Popular medical databases like MedLine and PubMed which allows to search for articles on biology and medicines and legal database like Westlaw which allows to search for legal documents are based on Boolean retrieval model. Even many search engines also use this information retrieval model. B. Query Expansion Query expansion is an effective way of enhancing performance of information retrieval systems. The basic process is that select new terms which are based on the initial query, and then combine both of them to form a new query [3]. It is more efficient to users for simpler search tasks whereas interactive query expansion is more productive for more complex search tasks. Irrespective of the method used, the key point is to get the best words that are used to expand the query.The aim for query expansion is to reduce the mismatch between query and documents by expanding the query terms using words or phrases which are synonymous to query terms. This has an impact on the recall of most information retrieval systems. Despite the significance of Boolean queries in professional search, there has not been much research on assisting information professionals in expanding search query. C. Stemming Generally in IR applications, stemming is done before index is created. Stemming algorithm is a procedure of linguistic normalization, in which the variant forms of a word are reduced to a common form, for example, (operates, operation, operatives, operational) -> oper. The terms extracted from documents are stemmed using some stemming algorithm. The purpose of this step is to remove various suffixes, to reduce number of distinct words, to have exactly matching stems, to save memory space andtime. It is vital to admire that we utilize stemmingwith the expectation of enhancing the performance of IR systems. D. Inverted Index Finding information is not the only action that exists in an Information Retrieval (IR) system. Indexing, for instance, refers to how information in the system is represented. The documents are represented through a set of index terms or keywords. These terms are extracted from the text of the documents. Inverted index is the standard method for supportingqueries on large text databases; there are no practical alternativesto inverted indexes that facilitate with sufficiently fast query evaluation. Apositional inverted index is a two-levelstructure. International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 www.ijert.org IJERTV4IS050514 (This work is licensed under a Creative Commons Attribution 4.0 International License.) Vol. 4 Issue 05, May-2015 425