International Journal of Neural Systems, Vol. 0, No. 0 (April, 2000) 00–00 c World Scientific Publishing Company ADAPTIVE K-MEANS ALGORITHM FOR OVERLAPPED GRAPH CLUSTERING Gema Bello-Orgaz†, H´ ector D. Men´ endez‡and David Camacho * Computer Science Department Escuela Politecnica Superior, Universidad Aut´ onoma de Madrid 28049, Madrid, Spain †gema.bello@uam.es ‡hector.menendez@uam.es * david.camacho@uam.es Received (to be inserted Revised by Publisher) The graph clustering problem has become highly relevant due to the growing interest of several research communities in social networks and their possible applications. Overlapped graph clustering algorithms try to find subsets of nodes that can belong to different clusters. In social-based applications it is quite usual for a node of the network to belong to different groups, or communities, in the graph. Therefore, algorithms trying to discover, or analyse, the behaviour of these networks need to handle this feature, detecting and identifying the overlapped nodes. This paper shows a soft clustering approach based on a genetic algorithm where a new encoding is designed to achieve two main goals. First, the automatic adaptation of the number of communities that can be detected. Second, the definition of several fitness functions that guide the searching process using some measures extracted from graph theory. Finally, our approach has been experimentally tested using the Eurovision contest dataset, a well-known social-based data network, to show how overlapped communities can be found using our method. Keywords : graph clustering, overlapped clustering, genetic algorithms, clustering coefficient, community finding, social networks. 1. Introduction The clustering problem can be described as a blind search on a collection of unlabelled data, where elements with similar features are grouped together in sets. There are three main techniques to deal with the clustering problem 32 : overlapping 12 (or non- exclusive), partitional 42 and hierarchical 37 . Over- lapping clustering allows each element to belong to multiple clusters, partitional clustering consists in a disjoint division of the data where each element belongs only to a single cluster, and hierarchical clus- tering nests the clusters formed through a partitional clustering method creating bigger partitions, group- ing the clusters by hierarchical levels. In this work, the approach is focused in the overlapping clustering techniques trying to “relax” a well-known classical partitional technique named K-means using a genetic algorithm approach. K-means is a clustering algo- rithm that uses a fixed number (K) of clusters and looks for the best division of the dataset (through a predefined metric or distance) in this number of groups. Several clustering algorithms, such as K-means, have been improved using genetic algorithms 32 .A genetic algorithm is inspired by biological evolution 38 : the possible problem solutions are represented as individuals belonging to a population. The in- dividuals are encoded using a set of chromosomes (called the genotype of the genome). Later these individuals are evolved, during a number of genera- * Corresponding author.