Aggregation pheromone density based data clustering Ashish Ghosh a, * , Anindya Halder b , Megha Kothari c , Susmita Ghosh c a Machine Intelligence Unit and Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India b Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India c Department of Computer Science and Engineering, Jadavpur University, Kolkata, India Received 17 May 2007; received in revised form 3 December 2007; accepted 25 February 2008 Abstract Ants, bees and other social insects deposit pheromone (a type of chemical) in order to communicate between the mem- bers of their community. Pheromone, that causes clumping or clustering behavior in a species and brings individuals into a closer proximity, is called aggregation pheromone. This article presents a new algorithm (called, APC) for clustering data sets based on this property of aggregation pheromone found in ants. An ant is placed at each location of a data point, and the ants are allowed to move in the search space to ﬁnd points with higher pheromone density. The movement of an ant is governed by the amount of pheromone deposited at diﬀerent points of the search space. More the deposited pheromone, more is the aggregation of ants. This leads to the formation of homogenous groups of data. The proposed algorithm is evaluated on a number of well-known benchmark data sets using diﬀerent cluster validity measures. Results are compared with those obtained using two popular standard clustering techniques namely average linkage agglomerative and k-means clustering algorithm and with an ant-based method called adaptive time-dependent transporter ants for clustering (ATTA- C). Experimental results justify the potentiality of the proposed APC algorithm both in terms of the solution (clustering) quality as well as execution time compared to other algorithms for a large number of data sets. Ó 2008 Elsevier Inc. All rights reserved. Keywords: Aggregation pheromone; Ant colony optimization; Swarm intelligence; Data clustering 1. Introduction and motivation In the literature a wide variety of clustering algorithms have been proposed for diﬀerent applications [18]. The fundamental problem of clustering is to partition a given data set into groups, such that the data points in a cluster are more similar to each other than points in diﬀerent clusters [18]. In clustering process there are no predeﬁned classes and training data patterns that would show what kind of desirable relations should be valid among the data; this feature distinguishes clustering from classiﬁcation [18]. Many clustering methods are available in literature, which can be broadly classiﬁed into following types [18] (i) partitional clustering, (ii) hierarchical clustering, (iii) density based clustering, (iv) graph based clustering, and (v) model based cluster- 0020-0255/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.02.015 * Corresponding author. E-mail address: ash@isical.ac.in (A. Ghosh). Available online at www.sciencedirect.com Information Sciences 178 (2008) 2816–2831 www.elsevier.com/locate/ins