Parallel Unsupervised k-Windows: An Efficient Parallel Clustering Algorithm Dimitris K. Tasoulis 1,2 Panagiotis D. Alevizos 1,2 , Basilis Boutsinas 2,3 , and Michael N. Vrahatis 1,2 1 Department of Mathematics, University of Patras, GR-26500 Patras, Greece {dtas, alevizos, vrahatis}@math.upatras.gr 2 University of Patras Artificial Intelligence Research Center (UPAIRC), University of Patras, GR-26500 Patras, Greece 3 Department of Business Administration, University of Patras, GR-26500 Patras, Greece vutsinas@bma.upatras.gr Abstract. Clustering can be defined as the process of partitioning a set of patterns into disjoint and homogeneous meaningful groups (clus- ters). There is a growing need for parallel algorithms in this field since databases of huge size are common nowadays. This paper presents a par- allel version of a recently proposed algorithm that has the ability to scale very well in parallel environments. 1 Introduction Clustering, that is the partitioning a set of patterns into disjoint and homoge- neous meaningful groups (clusters), is a fundamental process in the practice of science. In particular, clustering is fundamental in knowledge acquisition. It is applied in various fields including data mining [6], statistical data analysis [1], compression and vector quantization [15]. Clustering is, also, widely applied in most social sciences. The task of extracting knowledge from large databases, in the form of clus- tering rules, has attracted considerable attention. Due to the growing size of the databases there is also an increasing interest in the development of parallel implementations of data clustering algorithms. Parallel approaches to clustering can be found in [9,10,12,14,16]. Recent software advances [7,11], have provided the ability to collections of heterogeneous computers to be used as a coherent and flexible concurrent com- putational resource. The vast number of individual Personal Computers available in most scientific laboratories suffices to provide the necessary hardware. These pools of computational power exploit network interfaces to link individual com- puters. Since network infrastructure is currently immature to support high speed data transfer interfaces, it comprises a bottleneck to the entire system. So appli- cations that have the ability to exploit specific strengths of individual machines on a network, while minimizing the required data transfer rate are best suited for these environments. The results reported in the present paper indicate that the recently proposed k-windows algorithm [17] has the ability to scale very well in such environments. V. Malyshkin (Ed.): PaCT 2003, LNCS 2763, pp. 336–344, 2003. c Springer-Verlag Berlin Heidelberg 2003