Bees Swarm Optimization based Approach for Web Information Retrieval Habiba Drias, Hadia Mosteghanemi Department of Computer Science USTHB, LRIA, Algiers, Algeria hdrias@usthb.dz Abstract This paper deals with large scale information retrieval aiming at contributing to web searching. The collections of documents considered are huge and not obvious to tackle with classical approaches. The greater the number of documents belonging to the collection, the more powerful approach required. A Bees Swarm Optimization algorithm called BSO-IR is designed to explore the prohibitive number of documents to find the information needed by the user. Extensive experiments were performed on CACM and RCV1 collections and more large corpuses in order to show the benefit gained from using such approach instead of the classic one. Performances in terms of solutions quality and runtime are compared between BSO and exact algorithms. Numerical results exhibit the superiority of BSO-IR on previous works in terms of scalability while yielding comparable quality. Keywords; web information retrieval; very large collections of documents, scalability; evolutionary algorithms; swarm intelligence; BSO; classic approach 1. Introduction With the exponentially growing amount of information in the web, the classic process of search knows lacks in efficiency. Innovative tools to address information retrieval (IR) become necessary to cope with the complexity induced by this tremendous volume of information. Many different directions of research are contributing in handling the complexity of the problem. Distributed information retrieval and Personalizing Information Source Selection are examples of these research axes. The recent works are considering the user and sources profiles in order to restrict the search only to the sources that have the same profile as the user [6,7]. In this manner, a lot of information is pruned and therefore, the respond time of such systems becomes rapid. In this study, artificial intelligence approaches and more precisely bee swarm optimization (BSO) algorithms are designed for this purpose. We show through this work that evolutionary approaches may help to palliate the complexity issue. The original BSO meta-heuristic was introduced for the first time in [4] and applied successfully for the satisfiability problem. The same principles and framework are adapted for the problem that attracts our interest in the present study. The idea behind addressing web information retrieval with a BSO-based approach is the pruning of the prohibitive search space in order to browse only interesting documents and therefore get results in a reasonable amount of time. This meta-heuristic belongs to the vast and well recognized domain of swarm intelligence. Many works have been undertaken in this area and applied to many public and industrial sectors. The methodology used the most concerns the particle swarm optimization known as PSO. The present article develops a BSO approach, which is different from PSO and is inspired from the collective behavior of bees. BSO is the fruit of an aggregation of individual behaviours dictated by very simple rules. It presents an auto-organized working model, based on a decentralized logic, founded on the cooperation of units having only local information. Real bees communicate between them by means of a dance. In fact, a bee performs an active dance in order to draw the attention of its congeners, when exploring a region it finds a wealthy food source. The discovered area will be exploited by the bees at maximum. Then they will repeat this way of feeding indefinitely until satisfying their needs. Motivated by the success and the power of this meta- heuristic and knowing that a few heuristic search techniques have been studied to investigate information retrieval problem, we have designed a BSO algorithm, namely BSO-IR for exploring this useful domain. Three kinds of collections have been tested; CACM with 3204 documents, RCV1 with 804 414 documents and larger collections generated by our own process. Comparison with the classical IR method is performed. 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology 978-0-7695-4191-4/10 $26.00 © 2010 IEEE DOI 10.1109/WI-IAT.2010.179 6