International Journal of Computer Applications (0975 8887) Volume 47No.21, June 2012 1 Agglomerative Ants for Data Clustering Saroj Bala AKG Engineering College, Ghaziabad, U.P., India. S. I. Ahson Shobhit University, Meerut, U.P., India. R. P. Agarwal Shobhit University, Meerut, U.P., India. ABSTRACT Clustering is a data mining technique for the analysis of data in various areas such as pattern recognition, image processing, information science, bioinformatics etc. Hierarchical clustering techniques form the clusters based on top-down and bottom-up approaches. Hierarchical agglomerative clustering is a bottom-up clustering method. Ant based clustering methods form clusters by picking and dropping the objects according to surroundings. This paper proposes an agglomerative clustering algorithm, AGG_ANTS based on ant colonies. AGG_ANTS clusters the objects by moving ants on the grid and merging their loads according to similarity resulting in bigger clusters. It avoids the calculation of similarity in the surrounding and pick/drop of objects again and again resulting in a more efficient algorithm. Keywords: Clustering, Hierarchical, Agglomerative, Ant colony 1. INTRODUCTION Clustering is the task of assigning a set of objects into clusters so that the objects in the same cluster are more similar to each other than to those in other clusters. The four main classes of clustering algorithms available are partitioning methods, hierarchical methods, density based methods and grid based methods [8]. Hierarchical algorithms build clusters gradually. Hierarchical clustering is further subdivided into agglomerative and divisive. An agglomerative approach begins with each pattern in a distinct cluster, and successively merges clusters together until a stopping criterion is satisfied. A divisive method begins with all patterns in a single cluster and performs splitting until a stopping criterion meets. Most hierarchical clustering algorithms are variants of the single- link and complete link algorithms. These two algorithms differ in the way they characterize the similarity between a pair of clusters. In the single-link method, the distance between two clusters is the minimum of the distances between all pairs of patterns drawn from the two clusters (one pattern from the first cluster, the other from the second). In the complete-link algorithm, the distance between two clusters is the maximum of all pair wise distances between patterns in the two clusters. In either case, two clusters merge to form a larger cluster based on minimum distance criteria. Nowadays ant based algorithms [1] are becoming popular. Ant based Clustering is inspired by brood sorting in ant colonies. In these algorithms, artificial ants make heaps of objects just as real ants make heaps of dead bodies. Ants move in the workspace randomly. If an unloaded ant encounters an object, it picks up the object if it is surrounded by dissimilar objects. It continues walking and drops the object where it finds objects similar to the object it is carrying. The pick and drop actions result in clusters of similar objects without any initial knowledge of number of clusters. Here an agglomerative clustering method is proposed based on ant colonies. The ants will agglomerate if there will be some similarity in their load, otherwise keep on walking on the workspace. This paper is organized as follows: Section 2 explains the hierarchical agglomerative clustering algorithm. Ant based clustering research is discussed in section 3. Proposed method is introduced in the section 4. Section 5 and 6 discusses the experimental results and conclusion. 2. HIERARCHICAL AGGLOMERATIVE CLUSTERING Hierarchical agglomerative clustering is a bottom-up clustering method. It starts with every single object in a single cluster. Then, in each successive iteration, it agglomerates the closest pair of clusters by satisfying some similarity criteria, until all of the data is in one cluster. The procedure is as follows: 1. Initially each item n x x x x 3 2 1 , , is in its own cluster n C C C C 3 2 1 , , . 2. Repeat until there is only one cluster left. 3. Merge the nearest clusters C i and C j . The concept of nearest clusters may be based on different linkage variations as follows: Single-linkage: ) , ( min ) , ( , j i C x C x j i x x d C C d j j i i Complete-linkage: ) , ( max ) , ( , j i C x C x j i x x d C C d j j i i Average-linkage: j i j i j i C C x x d C C d j C j x i C i x , ) , ( ) , ( 3. ANT BASED CLUSTERING Firstly Deneubourg et. al [2] proposed the basic ant model for clustering. He focused on clustering objects by using a group of real-world robots. In his model, the ants would walk randomly on the workspace, picking or dropping one data element from it. The ants possessed only local perceptual capabilities. They could sense the surrounding objects were similar or not to the object, they were carrying. Based on this information, they would perform the pick or drop action. Gutowitz [4] improved this model by giving the ants the capacity to sense the complexity of their neighborhood. The ants would not try to pick or drop anything in areas with low complexity. These complexity-seeking ants were able to avoid actions that did not contribute to the clustering process, performing their task more efficiently.