An Entropy Maximization Approach to Optimal Model Selection in Gaussian Mixtures Antonio Pe˜ nalver, Juan M. S´ aez, and Francisco Escolano Robot Vision Group Departamento de Ciencia de la Computaci´on e Inteligencia Artiﬁcial Universidad de Alicante, Spain {apenalver, jmsaez, sco}@dccia.ua.es http://rvg.ua.es Abstract. In this paper we address the problem of estimating the pa- rameters of a Gaussian mixture model. Although the EM (Expectation- Maximization) algorithm yields the maximum-likelihood solution it has many problems: (i) it requires a careful initialization of the parameters; (ii) the optimal number of kernels in the mixture may be unknown before- hand. We propose a criterion based on the entropy of the pdf (probability density function) associated to each kernel to measure the quality of a given mixture model, and a modiﬁcation of the classical EM algorithm to ﬁnd the optimal number of kernels in the mixture. We test this method with synthetic and real data and compare the results with those obtained with the classical EM with a ﬁxed number of kernels. 1 Introduction Gaussian mixture models, have been widely used in the ﬁeld of statistical pattern recognition. One of the most common methods for ﬁtting mixtures to data is the EM algorithm [4]. However, this algorithm is prone to initialization errors and, in these conditions, it may converge to local maxima of the log-likelihood function. In addition, the algorithm requires that the number of elements (kernels) in the mixture is known beforehand. For a given number of kernels, the EM algorithm yields a maximum-likelihood solution but this does not ensure that pdf of the data (multi-dimensional patterns) is properly estimated. A maximum-likelihood criterion with respect to the number of kernels is not useful because it tends to use a kernel to describe each pattern. The so called model-selection problem has been addressed in many ways. Some approaches start with a few number of kernels and add new kernels when necessary. For instance, in [14], the kurtosis is used as a measure of non- Gaussianity yielding a test for splitting a kernel in one-dimensional data. In [15] this method is extended to the multi-dimensional case. This approach has same drawbacks, because kurtosis can be very sensitive to outliers. In [16] it is proposed a greedy method, which performs a global search in combination with another local search whenever a new kernel is added. A. Sanfeliu and J. Ruiz-Shulcloper (Eds.): CIARP 2003, LNCS 2905, pp. 432−439, 2003.  Springer-Verlag Berlin Heidelberg 2003