Enhanced Swarm-like Agents for Dynamically Adaptive Data Clustering SHERIN M.YOUSSEF, MOHAMED RIZK, MOHAMED El-SHERIF Department of Computer Engineering College of Engineering and Technology, Arab Academy for Science and Technology Alexandria, EGYPT sherin@aast.edu, mrmrizk@ieee.org, elsherif@aast.edu Abstract: - Data clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. Inspired by the self-organized behaviour of bird flocks, a new dynamic clustering approach based on Particle Swarm Optimization is proposed. In this paper, we introduced the PSDC approach, new particle swarm-like agents for multidimensional data clustering. Unlike other partition clustering algorithms, this technique does not require initial partitioned seeds and it can dynamically adapt to the changes in the global shape or size of the clusters. In this technique, the agents have lots of useful features such as sensing, thinking, making decisions and moving freely in the solution space. The moving swarm-like agents are guided to move according to a specific proposed navigation rules. Numerous experiments have been conducted using both synthetic and real datasets to evaluate the efficiency of the proposed model. Cluster validity approaches are used to quantitatively evaluate the results of the clustering algorithm. The experimental results showed that the proposed particle swarm-like clustering algorithm reaches good clustering solutions and achieves superior performance compared to others. Key-Words: - Agents, clustering, Ant clustering, k-means. 1 Introduction Data clustering [1, 2] is the process of identifying natural groupings or clusters, within multidimensional data, based on some similarity measure (e.g. Euclidean distance) [1, 3]. Clustering algorithms are used in many applications, such as data mining, compression, image segmentation, machine learning, etc. A cluster is usually identified by a cluster center (or centroid). Classical clustering algorithms are static, centralized, and batch. They are static because they assume that the data and similarity function do not change while clustering is taking place. They are centralized because they rely on data structures (such as similarity matrices [1, 3]) that must be accessed, and sometimes modified, at each step of the operation. They are batch because they run their course and then stop. It is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). Data clustering is a difficult problem as the clusters in data may have different shapes and sizes. Furthermore, it is usually not known how many clusters should be formed. Most clustering algorithms are based on two popular techniques known as hierarchical and partitional clustering [1]. In hierarchical clustering, the output is "a tree showing a sequence of clustering with each clustering being a partition of the data set". Such algorithms have the following advantages [2]: 1) the number of clusters need not be specified a priori, and 2) they are independent of the initial conditions. However, hierarchical clustering techniques suffer from the following drawbacks: They are static, i.e. data points assigned to a cluster cannot move to another cluster. They may fail to separate overlapping clusters due to a lack of information about the global shape or size of the clusters. On the other hand, partitional clustering algorithms [2, 4] partition the data set into a specified number of clusters. These algorithms try to minimize certain criteria (e.g. a square error function) and can therefore be treated as optimization problems. The advantages of hierarchical algorithms are the disadvantages of the partitional algorithms and vice versa. Particle swarm optimization was originally developed by Eberhart and Kennedy in 1995 [5], and was inspired by the social behaviour of a flock of birds. In the PSO algorithm, the birds in a flock are symbolically represented as particles. These particles can be considered as simple agents 2nd WSEAS Int. Conf on COMPUTER ENGINEERING and APPLICATIONS (CEA'08) Acapulco, Mexico, January 25-27, 2008 ISSN: 1790-5117 Page 213 ISBN: 978-960-6766-33-6