Bees Swarm Optimization based Approach for Web Information Retrieval
Habiba Drias, Hadia Mosteghanemi
Department of Computer Science
USTHB, LRIA, Algiers, Algeria
hdrias@usthb.dz
Abstract
This paper deals with large scale information
retrieval aiming at contributing to web searching. The
collections of documents considered are huge and not
obvious to tackle with classical approaches. The
greater the number of documents belonging to the
collection, the more powerful approach required. A
Bees Swarm Optimization algorithm called BSO-IR is
designed to explore the prohibitive number of
documents to find the information needed by the user.
Extensive experiments were performed on CACM and
RCV1 collections and more large corpuses in order to
show the benefit gained from using such approach
instead of the classic one. Performances in terms of
solutions quality and runtime are compared between
BSO and exact algorithms. Numerical results exhibit
the superiority of BSO-IR on previous works in terms
of scalability while yielding comparable quality.
Keywords; web information retrieval; very large
collections of documents, scalability; evolutionary
algorithms; swarm intelligence; BSO; classic
approach
1. Introduction
With the exponentially growing amount of information
in the web, the classic process of search knows lacks in
efficiency. Innovative tools to address information
retrieval (IR) become necessary to cope with the
complexity induced by this tremendous volume of
information. Many different directions of research are
contributing in handling the complexity of the problem.
Distributed information retrieval and Personalizing
Information Source Selection are examples of these
research axes. The recent works are considering the
user and sources profiles in order to restrict the search
only to the sources that have the same profile as the
user [6,7]. In this manner, a lot of information is
pruned and therefore, the respond time of such systems
becomes rapid.
In this study, artificial intelligence approaches and
more precisely bee swarm optimization (BSO)
algorithms are designed for this purpose. We show
through this work that evolutionary approaches may
help to palliate the complexity issue. The original BSO
meta-heuristic was introduced for the first time in [4]
and applied successfully for the satisfiability problem.
The same principles and framework are adapted for the
problem that attracts our interest in the present study.
The idea behind addressing web information retrieval
with a BSO-based approach is the pruning of the
prohibitive search space in order to browse only
interesting documents and therefore get results in a
reasonable amount of time. This meta-heuristic belongs
to the vast and well recognized domain of swarm
intelligence. Many works have been undertaken in this
area and applied to many public and industrial sectors.
The methodology used the most concerns the particle
swarm optimization known as PSO. The present article
develops a BSO approach, which is different from PSO
and is inspired from the collective behavior of bees.
BSO is the fruit of an aggregation of individual
behaviours dictated by very simple rules. It presents an
auto-organized working model, based on a
decentralized logic, founded on the cooperation of
units having only local information.
Real bees communicate between them by means of a
dance. In fact, a bee performs an active dance in order
to draw the attention of its congeners, when exploring a
region it finds a wealthy food source. The discovered
area will be exploited by the bees at maximum. Then
they will repeat this way of feeding indefinitely until
satisfying their needs.
Motivated by the success and the power of this meta-
heuristic and knowing that a few heuristic search
techniques have been studied to investigate information
retrieval problem, we have designed a BSO algorithm,
namely BSO-IR for exploring this useful domain.
Three kinds of collections have been tested; CACM
with 3204 documents, RCV1 with 804 414 documents
and larger collections generated by our own process.
Comparison with the classical IR method is performed.
2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology
978-0-7695-4191-4/10 $26.00 © 2010 IEEE
DOI 10.1109/WI-IAT.2010.179
6