CEM, EM, AND DAEM ALGORITHMS FOR LEARNING SELF-ORGANIZING MAPS Shih-Sian Cheng 1,2 , Hsin-Chia Fu 1 , and Hsin-Min Wang 2 1 Department of Computer Science, National Chiao Tung University, Hsin-Chu, Taiwan, ROC hcfu@csie.nctu.edu.tw 2 Institute of Information Science, Academia Sinica, Taipei, Taiwan, ROC {sscheng, whm}@iis.sinica.edu.tw ABSTRACT In this paper, we propose a generative model for self-organizing maps (SOM). Based on this model, we derive three EM-type al- gorithms for learning SOM, namely, the SOCEM, SOEM, and SODAEM algorithms. SOCEM is derived by using the classifi- cation EM (CEM) algorithm to learn the classification likelihood; SOEM is derived by using the EM algorithm to learn the mix- ture likelihood; and SODAEM is a deterministic annealing variant of SOCEM and SOEM. From our experiments on the organizing property of SOM, we observe that SOEM is less sensitive to the initialization of the parameters when using a small-fixed neighbor- hood than SOCEM, while SODAEM can overcome the initializa- tion problem of SOCEM and SOEM through an annealing process. 1. INTRODUCTION The self-organizing map (SOM) [1] is a neural network model for data visualization and clustering. The sequential and batch SOM learning algorithms proposed by Kohonen have proved success- ful in many practical applications. However, they also suffer from some shortcomings, such as the lack of an objective (cost) func- tion, a general proof of convergence, and a probability framework [2]. Some alternative SOM learning algorithms that addressed these issues have been proposed as follows. In [3], the behavior of Kohonen’s sequential learning algo- rithm was studied in terms of energy functions, based on which, Cheng [4] proposed an energy function for SOM whose parame- ters can be learned by the K-means type algorithm. Luttrell [5] proposed a noisy vector quantization model called the topographic vector quantizer (TVQ), whose training process coincides with the learning of SOM. The cost function of TVQ represents the topo- graphic distortion between the input data and the output code vec- tors in terms of Euclidean distance. Graepel et al. [6] applied the idea of deterministic annealing to the optimization of TVQ’s cost function, and developed an algorithm for noisy vector quantiza- tion which was called soft topographic vector quantizer (STVQ). On the basis of topographic distortion, Heskes [7] developed an algorithm identical to STVQ by applying another implementation for deterministic annealing. To enable choosing the correct model complexity for SOM by probabilistic assessment, Lampinen and Kostiainen [8] developed a generative model for which the SOM trained by Kohonen’s algorithm or TVQ gives the maximum like- lihood estimate. Van Hulle developed a kernel-based topographic formation in [9], where the parameters are adjust to maximize the joint entropy of the kernel outputs. Later, he developed an new algorithm with heteroscedastic Gaussian mixtures that allows for a unified account of vector quantization, log-likelihood, and Kullback-Leibler divergence [10]. Another probabilistic formula- tion can be found in [11], where a normalized neighborhood func- tion of SOM is adopted as the posterior distribution in E-step of EM algorithm used for learning a mixture model to enforce the self-organizing of the mixture components. Sum et al. [12] interpreted Kohonen’s sequential learning al- gorithm as maximizing the local correlations (coupling energies) between neurons and their neighborhoods for the given input data. Thus, they proposed an energy function for SOM that reveals the correlations, and a gradient ascent learning algorithm for the en- ergy function. Motivated by the work of Sum et al., we propose a generative model for SOM that expresses the local coupling ener- gies over the network with probabilistic likelihoods. Based on the proposed model, we also develop three EM-type algorithms for learning SOM, namely, the SOCEM, SOEM, and SODAEM algo- rithms. SOCEM is derived by using the classification EM (CEM) algorithm [13] to learn the classification likelihood; SOEM is de- rived by using the EM algorithm to learn the mixture likelihood; and SODAEM is a deterministic annealing variant of SOCEM and SOEM. Because they inherit the properties of the CEM and EM al- gorithms, all three algorithms include the features of reliable con- vergence, low cost per iteration, economy of storage, and ease of programming. From our experiments on the organizing property of SOM, we observe that SOEM is less sensitive to the initializa- tion of the parameters when using a small-fixed neighborhood than SOCEM, while SODAEM can overcome the initialization problem of SOCEM and SOEM through an annealing process. The remainder of this paper is organized as follows. We first describe the formulation of the generative model in Section 2. Then, the derivations of the SOCEM, SOEM, and SODAEM al- gorithms are given in Section 3. The experimental results are pre- sented in Section 4. We then present our conclusions in Section 5. 2. FORMULATION OF THE GENERATIVE MODEL FOR SOM The SOM model [1] consists of G neurons in a network R = {r1, r2, ··· , rG} with a neighborhood function h kl that defines the strength of lateral interaction between two neurons, r k and r l , for k, l ∈{1, 2, ··· ,G}. Each neuron, r k , associates with a reference model k in the input data space. Sum et al. [12] interpreted Kohonen’s sequential SOM learn- ing algorithm as maximizing the local correlations (coupling ener- gies) between the neurons and their neighborhoods with the given