P. Perner (Ed.): MLDM 2013, LNAI 7988, pp. 246–259, 2013.
© Springer-Verlag Berlin Heidelberg 2013
SOM++: Integration of Self-Organizing Map
and K-Means++ Algorithms
Yunus Dogan, Derya Birant, and Alp Kut
Dokuz Eylul University, Department of Computer Engineering,
Tinaztepe Campus, Buca, 35397 Izmir, Turkey
{yunus,derya,alp}@cs.deu.edu.tr
Abstract. Data clustering is an important and widely used task of data mining
that groups similar items together into subsets. This paper introduces a new
clustering algorithm SOM++, which first uses K-Means++ method to determine
the initial weight values and the starting points, and then uses Self-Organizing
Map (SOM) to find the final clustering solution. The purpose of this algorithm
is to provide a useful technique to improve the solution of the data clustering
and data mining in terms of runtime, the rate of unstable data points and internal
error. This paper also presents the comparison of our algorithm with simple
SOM and K-Means + SOM by using a real world data. The results show that
SOM++ has a good performance in stability and significantly outperforms three
other methods training time.
Keywords: Data Mining, Clustering, Self-Organizing Map, K-Means++,
Mining Methods and Algorithms.
1 Introduction
Cluster analysis is the process of grouping data into subsets such that each item in a
cluster is more similar to the items in the same cluster than to the other items at the
outside of the cluster. Generally, distance measures like Euclidean distance,
Manhattan distance are utilized to evaluate the dissimilarity between data points.
Cluster analysis is one of the most useful tasks in machine learning and data mining,
and has been used in a variety of fields such as marketing, banking, medicine and
telecommunication. It has been widely used in dimensionality reduction, information
extraction, density approximation and data compression [15] [6] [7] [16].
The K-means [12] algorithm is the most commonly used partitioning cluster
algorithm with its easy implementation and its efficient execution time. Self-
organizing map (SOM) [11] is an unsupervised, well-established and widely used
clustering technique.
In SOM, initial weight values are assigned randomly, method performance is
sensitive to these values and it is prohibitively slow in large data applications. In order
to decrease the time complexity of SOM, we investigated different initialization
procedures for optimal SOM and now propose K-Means++ as the most convenient
method, given the proper training parameters.