Generative Linear Mixture Modelling Antony Lawson 1 , Jochen Einbeck 1 1 Department of Mathematical Sciences, Durham University, Durham, DH1 3LE, England E-mail for correspondence: jochen.einbeck@durham.ac.uk Abstract: For multivariate data with a low–dimensional latent structure, a novel approach to linear dimension reduction based on Gaussian mixture models is pro- posed. A generative model is assumed for the data, where the mixture centres (or ‘mass points’) are positioned along lines or planes spanned through the data cloud. All involved parameters are estimated simultaneously through the EM al- gorithm, requiring an additional iteration within each M-step. Data points can be projected onto the low–dimensional space by taking the posterior mean over the estimated mass points. The compressed data can then be used for further pro- cessing, for instance as a low–dimensional predictor in a multivariate regression problem. Keywords: EM; Dimension Reduction; Mixture Modelling. 1 Introduction Mixtures of exponential family distributions are often used to model com- plex data structures, with ﬁnite Gaussian mixtures being the most common representant of such models. In this article we are interested in situations where a multivariate data set, x i ∈ R m , i =1,...,n, possesses a latent structure of lower dimension d<m (these ‘data’ may play the role of a ‘predictor space’ in a multivariate regression problem, but this is not rel- evant for the moment). The objective, for now, is to recover the latent structure, and to compress the original data by projecting them (in some form) onto the estimated latent space. As a ﬁrst step towards a more gen- eral handling of this problem, we consider a simpliﬁed scenario in which the latent structure is thought to be a straight line, say α + βz, with α, β ∈ R m , z ∈ R, through an m-dimensional space. The variable z is considered as a random eﬀect, and represented by a discrete distribution with mass points z k ∈ R and masses π k ,k =1,...,K. The data are assumed to be generated by adding Gaussian noise ε i ∼ N (0, Σ) to mixture centres α + βz k ∈ R m positioned along this line, yielding the generative linear mixture model x i = α + βz k + ε i . (1)