Penalized Maximum Likelihood Estimator for Normal Mixtures GABRIELA CIUPERCA Universite´ Lyon 1 ANDREA RIDOLFI CNRS-Laboratoire des Signaux et Syste`mes and E ´ cole Polytechnique Fe´de´rale de Lausanne JE ´ RO ˆ ME IDIER CNRS-Laboratoire des Signaux et Syste`mes ABSTRACT. The estimation of the parameters of a mixture of Gaussian densities is considered, within the framework of maximum likelihood. Due to unboundedness of the likelihood function, the maximum likelihood estimator fails to exist. We adopt a solution to likelihood function degeneracy which consists in penalizing the likelihood function. The resulting penalized likelihood function is then bounded over the parameter space and the existence of the penalized maximum likelihood estimator is granted. As original contribution we provide asymptotic properties, and in particular a consistency proof, for the penalized maximum likelihood estimator. Numerical examples are provided in the finite data case, showing the performances of the penalized estimator compared to the standard one. Key words: Bayesian estimation, mixtures of normal distributions, penalized maximum likelihood, strong consistency 1. Introduction Mixture distributions are typically used to model data in which each observation is assumed to come from one of p different groups, each group being suitably modelled by a probability density belonging to a parametric family. They are well fitted for clustering the observations together into groups for discrimination or classification: the mixture proportions then represent the relative frequency of occurrence of each group in the population. Mixture models also provide a convenient and flexible class of models for estimating or approximating distributions. The first attempts to analyse a mixture model are often attributed to Pearson (1894) but, as stated in Butler (1986), Newcomb (1886) predated Pearson’s work. Since then, mixture models have been used in a large range of applications. In particular, independent mixture models fit several problems well in signal and image processing. An example of application of mixtures in biological (plant morphology measures) and physiological (EEG signal) data modelling is presented in Roberts et al. (1998). In Champagnat et al. (1996) a Bernoulli–Gaussian mixture model is adopted in a deconvolution problem. McLachlan & Basford (1987) highlight the important role of mixture models in the field of cluster analysis and Biernacki et al. (1997) propose a model selection criteria applied to multivariate real data sets. Markovian mixture models are also commonly used, as in Ridolfi (1997), where an application to medical image segmentation is considered. In our study we consider mixture densities of p univariate normal densities, with p known, defined as Ó Board of the Foundation of the Scandinavian Journal of Statistics 2003. Published by Blackwell Publishing Ltd, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA Vol 30: 45–59, 2003