Bayesian Estimation and Classification with Incomplete Data Using Mixture Models Jufen Zhang J.Zhang@exeter.ac.uk Department of Computer Science University of Exeter Exeter, UK. Richard Everson R.M.Everson@exeter.ac.uk Abstract Reasoning from data in practical problems is frequently hampered by missing observa- tions. Mixture models provide a powerful general semi-parametric method for model- ling densities and have close links to radial basis function neural networks (RBFs). In this paper we extend the Data Augment- ation (DA) technique for multiple imputa- tion to Gaussian mixture models to permit fully Bayesian inference of the mixture model parameters and estimation of the missing val- ues. The method is illustrated and compared to imputation using a single normal density on synthetic data and real-world data sets. In addition to a lower mean squared error, mixture models provide valuable information on the potentially multi-modal nature of im- puted values, and by modelling the missing data more accurately, so that higher classi- fication rates can be achieved compared with simple mean imputation methods. The DA formalism is extended to a classifier closely related to RBF networks to permit Bayesian classification with incomplete data; the tech- nique is illustrated on synthetic and real- world datasets. This efficient technology en- ables us to perform Bayesian imputation, parameter estimation and classification sim- ultaneously for data with missing values. 1 INTRODUCTION Measured data are frequently marred by missing val- ues. When data are plentiful it may be sufficient to discard the incomplete observations, but utilising all the available information for learning and inference is generally important, and it is often necessary to clas- sify or predict an outcome from incomplete predictors. For example, trauma room or intensive care unit med- ical data, such as blood pressure, heart rate, injury type, etc., collected in extremis are often incomplete, but it is necessary to make predictions from these data. Many methods for filling in, or imputing, missing values have been developed (see [Little and Rubin, 2002] for a comprehensive treatment); simple meth- ods are to replace missing values by the mean of the observed values or to regress missing values from the observed data. Maximum likelihood learning via the Expectation-Maximisation (EM) algorithm [Dempster et al., 1977], in which the missing observations are re- garded as hidden variables, permits inference of miss- ing values and takes account of the additional uncer- tainty in parameters caused by missing observations [Ghahramani and Jordan, 1994]. In a Bayesian framework the Data Augmentation (DA) algorithm, introduced by Tanner and Wong [1987], is the natural analogue of the EM algorithm; it amounts to Gibbs sampling from the joint posterior distribu- tion of the parameters and the missing values. Since many samples are drawn for the missing variables DA is a multiple imputation (MI) technique [Rubin, 1987]. The DA algorithm has been widely used for missing value imputation under the assumption of a normal model [Schafer, 1997]. In this paper we use data aug- mentation for the inference of missing values and para- meters of mixture models. Mixture models are well known as a flexible semi-parametric density model, capable of modelling a wide range of densities. Diebolt and Robert [1994] developed a Gibbs sampling scheme for sampling from the posterior parameter distribution of uni-dimensional mixture models, which we extend to the multi-dimensional mixtures in order to incorpor- ate DA. Mixture models are closely related to radial basis function (RBF) neural networks. In a similar manner to Tr˚ av´ en [1991] and Sykacek [2000], for clas- sification problems with missing data we utilise a mix- ture model with separate mixing coefficients for each class to model class conditional densities. The mixture