DOI: http://dx.doi.org/10.26483/ijarcs.v9i1.5505 Volume 9, No. 1, January-February 2018 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info © 2015-19, IJARCS All Rights Reserved 778 ISSN No. 0976-5697 A SURVEY PAPER ON INFORMATION RETRIEVAL SYSTEM Arpit Deo Research Scholar IES IPS Academy Indore, India Jayesh Gangrade Associate Professor IES IPS Academy Indore, India Shweta Gangrade Assistant Professor IES IPS Academy Indore, India Abstract: Information retrieval is the process of obtaining and presenting more related information from the largest collection of information resources according to the user’s need. The tremendous growth in information resources on the Internet makes the information retrieval process a tedious and difficult task for users. Due to information overloading, there is a need for better techniques to retrieve most relevant information from web. This paper presents the information retrieval system by using the PSO algorithm. In presented system, to extract the text from web documents, all html tags are removed. After that stop words and special characters are removed from extracted text for recovering only meaningful contents. TF-IDF concept is used for feature selection. Now PSO optimization technique is used for identifying and refining the features set, these selected features are stored in a database which is used for information retrieval process. In other hand input query is converted into more than one similar semantic query strings. These query strings are compared with the obtained feature sets in the database by using the cosine similarity function. The most similar text is retrieved as an outcome of the information retrieval system. Keywords: Information retrieval system; feature extraction; PSO optimization; similar query generator; similarity measure I. INTRODUCTION Today internet has turned to be the largest information sources. The World Wide Web is the collection of many interlinked hypertext documents. It provides the huge amount of information which is accessed via Internet by using hypertext transfer protocol. The web provides many types of informative data, such as text, images, videos and other multimedia data. The tremendous growth of information resources makes the information retrieval a difficult and tedious task for users. Because of that reason, user can’t be able to access relevant information effectively [1]. Information retrieval plays a vital role in web search engines to access most relevant information according to the user's input query. It is a mainstream and the basics of web search engines. Information retrieval is the process of obtaining and presenting more related information from the largest collection of information resources according to the user’s input query. Whenever a user needs to access the information, it is necessary to enter a formal statement into a search engine. This formal statement, also known as a search engine’s input query. A query does not obtain and present a single information resource in the largest collection of information resources. Instead, several information resources are presented those are matched by input query. Most relative to least relative information resources will be shown to the user [2]. Web search engines such as Bing, Yahoo, Google, Excite, AltaVista etc. are used by millions of users to access information across the world on any topic. Information retrieval system is used in many application areas such as digital libraries, information filtering, recommendation system, media search, image retrieval etc. [3]. A. PSO Algorithm PSO is an evolutionary computation method that inspired from the simulation of social behavior. It is based on birds flocking. It optimizes the population-based problems by iterative computation. It computes the initial population, which is random solutions of the problem and then provides the improved candidate results. It is also known as particles. The algorithm initialized by potential solutions of a population-based optimization problem, each and every potential solution (particle) has randomized velocity. Let S w be the size of the swarm. Each particles i k is initialized with random position P k and velocity Vk. Fk is threshold objective function. It takes positional coordinates of particles as input. All particles are associated with best results called p best , in the problem space. The global best value is represented by g best . In every iteration p best location and velocity of each particle is changed and also function is evaluated with changed positions and velocities. Following steps of PSO algorithm: 1) Initialization: a) initialize a population by potential solution with random position and velocities. b) Evaluate the fitness of each population. c) Stored the personal best position of each population in memory.