A Load Balancing Knapsack Algorithm for Parallel Fuzzy c-Means Cluster Analysis Marta V. Modenesi; Alexandre G. Evsukoff; Myrian C. A. Costa. COPPE/Federal University of Rio de Janeiro, P.O.Box 68506, 21945-970 Rio de Janeiro RJ, Brazil Tel: (+55) 21 25627388, Fax: (+55) 21 25627392 modenesi@hotmail.com, evsukoff@coc.ufrj.br, myrian@nacad.ufrj.br, Abstract. This work proposes a load balance algorithm to parallel processing based on a variation of the classical knapsack problem. The problem consid- ers the distribution of a set of partitions, defined by the number of clusters, over a set of processors attempting to achieve a minimal overall processing cost. The work is an optimization for the parallel fuzzy c-means (FCM) clus- tering analysis algorithm proposed in a previous work composed by two dis- tinct parts: the cluster analysis, properly said, using the FCM algorithm to calculate of clusters centers and the PBM index to evaluate partitions, and the load balance, which is modeled by the multiple knapsack problem and im- plemented through a heuristic that incorporates the restrictions related to cluster analysis in order to gives more efficiency to the parallel process. Topics of Interest: Unsupervised Classification, Fuzzy c-Means, Load Bal- ance, Optimization. 1. Introduction Cluster analysis is the unsupervised classification of data into groups (clusters) and it is one of the most intensive computational tasks in data mining. It is thus very attractive for parallel processing and many parallel and distributed clustering algo- rithms have been recently studied [1][2][3]. There are several approaches that have been studied for cluster analysis algo- rithms [4]. In the partition approach, two main optimization problems are addressed: to find the number of clusters presents in the data and the location of clusters centers. The later problem is much easier to solve, and iterative greedy optimization algo- rithms, such as the k-means algorithm and its variants, are widely used for that pur- pose, being well known by the data mining community. The k-means algorithm has been extended to the fuzzy c-means algorithm by Bezdek in the early eighties [5]. The fuzzy c-means (FCM) algorithm computes a