Prof. Anuradha D. Thakare, Mrs. Shruti M. Chaudhari / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue 6, November- December 2012, pp.1455-1459 1455 | P a g e Introducing a Hybrid Swarm Intelligence Based Technique for Document Clustering Prof. Anuradha D. Thakare*, Mrs. Shruti M. Chaudhari** *(Department of Computer Engineering, Pune University, Pune-44) ** (Department of Computer Engineering, Pune University, Pune-44) ABSTRACT Swarm intelligence (SI) is widely used in many complex optimization problems. It is a collective behavior of social systems such as honey bees (bee algorithm, BA) and birds (particle swarm optimization, PSO). This paper presents a detailed overview of Particle Swarm Optimization (PSO), its variants and hybridization of PSO with Bee Algorithm (BA). This paper also surveys various SI techniques presented by the researchers. The objective is to utilize the capability of this technique for document clustering which will be utilized to solve the issues of clustering by applying modifications to the Bee Algorithm and Particle Swarm Optimization. Keywords- Bee Algorithm (BA), Clustering, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Swarm Intelligence (SI). I. INTRODUCTION Clustering is an important unsupervised classification technique. In clustering, a set of patterns, usually vectors in a multi dimensional space, are grouped into clusters in such a way that patterns in the same cluster are similar in some sense and patterns in different clusters are dissimilar in the same sense. For this it is necessary to first define a measure of similarity which will establish a rule for assigning patterns to the domain of a particular cluster centre. One such measure of similarity may be the Euclidean distance D between two patterns x and z defined by D=(x-z) [1]. Smaller the distance between x and z, greater is the similarity between the two and vice versa. Several clustering techniques are available in the literature. Some like the widely used K means algorithm, optimize of the distance criterion either by minimizing the within cluster spread (as implemented in this article), or by maximizing the inter-cluster separation. Other techniques like the graph theoretical approach, hierarchical approach, etc., are also available which perform clustering based on other criteria. The concept of clustering has been around for a long time. It has several applications, particularly in the context of information retrieval and in organizing web resources. The main purpose of clustering is to locate information and in the present day context, to locate most relevant electronic resources. The research in clustering eventually led to automatic indexing to index as well as to retrieve electronic records. Clustering is a method in which we make cluster of objects that are somehow similar in characteristics. The ultimate aim of the clustering is to provide a grouping of similar records. Clustering is often confused with classification, but there is some difference between the two. In classification the objects are assigned to pre defined classes, whereas in clustering the classes are formed. The term “class” is in fact frequently used as synonym to the term “cluster” [1]. Extensive studies dealing with comparative analysis of different clustering methods suggest that there is no general strategy which works equally well in different problem domains. However, it has been found that it is usually beneficial to run schemes that are simpler, and execute them several times, rather than using schemes that are very complex but need to be run only once. An intuitively simple criterion is the within cluster spread, which, as in the K- means algorithm, needs to be minimized for good clustering. However, unlike the K-means algorithm which may get stuck at values which are not optimal, the proposed technique should be able to provide good results irrespective of the starting configuration [2]. Document clustering is the process of grouping document into a number of clusters. The goal of document clustering is to make the data in the same cluster share a high degree of similarity while being very dissimilar to document from other clusters [3]. Swarm Intelligence and GA for Clustering: Swarm Intelligence (SI) is a computational intelligence technique to solve complex real world problems. It involves the study of collective behavior of individuals in population. The individual interact locally with one another and with their environment in a decentralized control system. The term SI has come to represent the idea that it is possible to control and manage complex systems of interacting entities even though the interactions between and among the entities being controlled is, in some sense, minimal [4]. This notion therefore lends itself to forms of distributed control that may be much more efficient, scalable and effective for