Modified global k-means algorithm for minimum sum-of-squares clustering problems Adil M. Bagirov Centre for Informatics and Applied Optimization, School of Information Technology and Mathematical Sciences, University of Ballarat, Victoria, 3353, Australia, E-mail: a.bagirov@ballarat.edu.au, Tel.: +61 3 5327 9330, Fax: +61 3 5327 9289 Abstract k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and inefficient for solv- ing clustering problems in large data sets. Recently, a new version of the k-means algorithm, the global k-means algorithm has been developed. It is an incremental algorithm that dynamically adds one cluster center at a time and uses each data point as a candidate for the k-th cluster center. Results of numerical experiments show that the global k-means algorithm considerably outperforms the k-means algorithms. In this paper, a new version of the global k-means algorithm is proposed. A starting point for the k-th cluster center in this algorithm is computed by minimizing an aux- iliary cluster function. Results of numerical experiments on 14 data sets demonstrate the superiority of the new algorithm, however, it requires more computational time than the global k-means algorithm. Keywords: minimum sum-of-squares clustering, nonsmooth optimization, k-means algo- rithm, global k-means algorithm. 1 Introduction The cluster analysis deals with the problems of organization of a collection of patterns into clusters based on similarity. It is also known as the unsupervised classification of patterns and has found many applications in different areas. In cluster analysis we assume that we have been given a finite set of points A in the n-dimensional space IR n , that is A = {a 1 ,...,a m }, where a i IR n ,i =1,...,m. There are different types of clustering. In this paper, we consider the hard unconstrained partition clustering problem, that is the distribution of the points of the set A into a given number k of disjoint subsets A j ,j =1,...,k with respect to predefined criteria such that: 1) A j = ,j =1,...,k; 2) A j A l = , j,l =1,...,k,j = l; 1