P. Perner (Ed.): MLDM 2013, LNAI 7988, pp. 246–259, 2013. © Springer-Verlag Berlin Heidelberg 2013 SOM++: Integration of Self-Organizing Map and K-Means++ Algorithms Yunus Dogan, Derya Birant, and Alp Kut Dokuz Eylul University, Department of Computer Engineering, Tinaztepe Campus, Buca, 35397 Izmir, Turkey {yunus,derya,alp}@cs.deu.edu.tr Abstract. Data clustering is an important and widely used task of data mining that groups similar items together into subsets. This paper introduces a new clustering algorithm SOM++, which first uses K-Means++ method to determine the initial weight values and the starting points, and then uses Self-Organizing Map (SOM) to find the final clustering solution. The purpose of this algorithm is to provide a useful technique to improve the solution of the data clustering and data mining in terms of runtime, the rate of unstable data points and internal error. This paper also presents the comparison of our algorithm with simple SOM and K-Means + SOM by using a real world data. The results show that SOM++ has a good performance in stability and significantly outperforms three other methods training time. Keywords: Data Mining, Clustering, Self-Organizing Map, K-Means++, Mining Methods and Algorithms. 1 Introduction Cluster analysis is the process of grouping data into subsets such that each item in a cluster is more similar to the items in the same cluster than to the other items at the outside of the cluster. Generally, distance measures like Euclidean distance, Manhattan distance are utilized to evaluate the dissimilarity between data points. Cluster analysis is one of the most useful tasks in machine learning and data mining, and has been used in a variety of fields such as marketing, banking, medicine and telecommunication. It has been widely used in dimensionality reduction, information extraction, density approximation and data compression [15] [6] [7] [16]. The K-means [12] algorithm is the most commonly used partitioning cluster algorithm with its easy implementation and its efficient execution time. Self- organizing map (SOM) [11] is an unsupervised, well-established and widely used clustering technique. In SOM, initial weight values are assigned randomly, method performance is sensitive to these values and it is prohibitively slow in large data applications. In order to decrease the time complexity of SOM, we investigated different initialization procedures for optimal SOM and now propose K-Means++ as the most convenient method, given the proper training parameters.