DOI: http://dx.doi.org/10.26483/ijarcs.v9i1.5505
Volume 9, No. 1, January-February 2018
International Journal of Advanced Research in Computer Science
RESEARCH PAPER
Available Online at www.ijarcs.info
© 2015-19, IJARCS All Rights Reserved 778
ISSN No. 0976-5697
A SURVEY PAPER ON INFORMATION RETRIEVAL SYSTEM
Arpit Deo
Research Scholar
IES IPS Academy
Indore, India
Jayesh Gangrade
Associate Professor
IES IPS Academy
Indore, India
Shweta Gangrade
Assistant Professor
IES IPS Academy
Indore, India
Abstract: Information retrieval is the process of obtaining and presenting more related information from the largest collection of information
resources according to the user’s need. The tremendous growth in information resources on the Internet makes the information retrieval process
a tedious and difficult task for users. Due to information overloading, there is a need for better techniques to retrieve most relevant information
from web. This paper presents the information retrieval system by using the PSO algorithm. In presented system, to extract the text from web
documents, all html tags are removed. After that stop words and special characters are removed from extracted text for recovering only
meaningful contents. TF-IDF concept is used for feature selection. Now PSO optimization technique is used for identifying and refining the
features set, these selected features are stored in a database which is used for information retrieval process. In other hand input query is
converted into more than one similar semantic query strings. These query strings are compared with the obtained feature sets in the database by
using the cosine similarity function. The most similar text is retrieved as an outcome of the information retrieval system.
Keywords: Information retrieval system; feature extraction; PSO optimization; similar query generator; similarity measure
I. INTRODUCTION
Today internet has turned to be the largest information
sources. The World Wide Web is the collection of many
interlinked hypertext documents. It provides the huge amount
of information which is accessed via Internet by using
hypertext transfer protocol. The web provides many types of
informative data, such as text, images, videos and other
multimedia data. The tremendous growth of information
resources makes the information retrieval a difficult and
tedious task for users. Because of that reason, user can’t be
able to access relevant information effectively [1].
Information retrieval plays a vital role in web search
engines to access most relevant information according to the
user's input query. It is a mainstream and the basics of web
search engines.
Information retrieval is the process of obtaining and
presenting more related information from the largest collection
of information resources according to the user’s input query.
Whenever a user needs to access the information, it is
necessary to enter a formal statement into a search engine.
This formal statement, also known as a search engine’s input
query. A query does not obtain and present a single
information resource in the largest collection of information
resources. Instead, several information resources are presented
those are matched by input query. Most relative to least
relative information resources will be shown to the user [2].
Web search engines such as Bing, Yahoo, Google, Excite,
AltaVista etc. are used by millions of users to access
information across the world on any topic.
Information retrieval system is used in many application
areas such as digital libraries, information filtering,
recommendation system, media search, image retrieval etc.
[3].
A. PSO Algorithm
PSO is an evolutionary computation method that inspired
from the simulation of social behavior. It is based on birds
flocking. It optimizes the population-based problems by
iterative computation. It computes the initial population, which
is random solutions of the problem and then provides the
improved candidate results. It is also known as particles.
The algorithm initialized by potential solutions of a
population-based optimization problem, each and every
potential solution (particle) has randomized velocity.
Let S
w
be the size of the swarm. Each particles i
k
is
initialized with random position P
k
and velocity Vk. Fk is
threshold objective function. It takes positional coordinates of
particles as input. All particles are associated with best results
called p
best
, in the problem space. The global best value is
represented by g
best
. In every iteration p
best
location and
velocity of each particle is changed and also function is
evaluated with changed positions and velocities.
Following steps of PSO algorithm:
1) Initialization:
a) initialize a population by potential solution with
random position and velocities.
b) Evaluate the fitness of each population.
c) Stored the personal best position of each
population in memory.