INITIALIZING SUBSPACE CONSTRAINED GAUSSIAN MIXTURE MODELS Peder A. Olsen, Karthik Visweswariah and Ramesh Gopinath IBM T.J. Watson Research Center {pederao,kv1,rameshg}@us.ibm.com ABSTRACT A recent series of papers [1, 2, 3, 4] introduced Subspace Con- strained Gaussian Mixture Models (SCGMMs) and showed that SCGMMs can very efﬁciently approximate Full Covariance Gaus- sian Mixture Models (FCGMMs); a signiﬁcant reduction in the number of parameters is achieved with little loss in the accuracy of the model. SCGMMs were arrived at as a sequence of generaliza- tions of diagonal covariance GMMs. As an artifact of this process the initialization of SCGMM parameters in that work is complex i.e., relies on best parameter settings of less general models. This paper overcomes this problem by showing how an FCGMM can be used to give a simple and direct initialization of an SCGMM. The initialization scheme is powerful enough that as the number of parameters in an SCGMM approaches that of an FCGMM (i.e., large SCGMMs) further training of the SCGMM is unnecessary. 1. INTRODUCTION In most state-of-the-art speech recognition systems, hidden Markov models (HMMs) are used to estimate likelihood of an acoustic observation given a word sequence. One of the key in- gredients of the HMM models is a probability distribution p(x|s) for the acoustic vector x ∈ R d at particular time, conditioned on an HMM state s. Typically, p(x|s) is taken to be a Gaussian mix- ture model (GMM), or more generally, a mixture of exponential models: P (x|s)= X g∈s πg E (x; θg , f ), (1) where E (x; θ, f )= e θ ⊤ f (x) Z(θ) , (2) is the general exponential model and Z(θ)= R R d e θ ⊤ f (x) dx is the normalizer for the exponential distribution. From computati- nal and storage considerations most speech recognition systems take E (x; |θg , f ) to be a diagonal Gaussian distribution. A recent series of papers [1, 2, 3, 4] introduced Subspace Constrained Gaus- sian Mixture Models (SCGMMs) that provide an efﬁcient “slider” between Diagonal Covariance GMMs (DCGMMs) and Full Co- variance GMMs (FCGMMs). In that work SCGMMs were arrived at via a sequence of generalizations of DCGMMs and hence, for historical reasons, the initialization of the parameters of SCGMMs was complex i.e., relied on the best available parameter settings of less general models. Effectively with that approach one needed to have training software for less general models in order to ar- rive at an initialization for an SCGMM. This paper overcomes that problem by providing a simple and direct method to initialize pa- rameters of an SCGMM model. A full covariance gaussian N (x; Σ, μ)= exp “ − 1 2 (x − μ) ⊤ Σ −1 (x − μ) ” p det(2πΣ) (3) can be written in form of an exponential model as follows: N (x; Σ, μ)= E (x; θ fc , f fc )= e θ ⊤ fc f fc (x) Z fc (θ fc ) , (4) where we deﬁne the full covariance features f fc to be f fc (x)=(x ⊤ , −vec(xx ⊤ ) ⊤ ) ⊤ , (5) and vec is an operator on symmetric matrices deﬁned as a vector containing the elements of the lower triangular portion with the diagonal scaled by 1/ √ 2 vec(X)=( X11 √ 2 ,X12 , X22 √ 2 ,X13 ,..., X dd √ 2 ) ⊤ . (6) It can be veriﬁed that in terms of these features the model param- eters can be written θ fc =(ψ ⊤ , p ⊤ ) ⊤ ∈ R (d+1)(d+2)/2 , where the model parameters ψ ∈ R d and p ∈ R d(d+1)/2 corresponds to the linear and quadratic features. In terms of the precision matrix P = Σ −1 the quadratic model parameters p are p = vec(P) (7) and the linear model parameters are ψ = Pμ. (8) The normalizer for the full covariance model in terms of P and ψ is 2 log(Z(θ)) = log det „ P 2π « − ψ ⊤ P −1 ψ . (9) A Subspace Constrained Gaussian, [1], is an exponential model with features Φf fc (x) ∈ R D , where Φ ∈ R D×(d+1)(d+2)/2 and λ ∈ R D . S (x; λ, Φ) denotes the Subspace Constrained Gaus- sian and satisﬁes the relation S (x; λ, Φ)= E (x; λ, Φf fc ). (10) This paper will describe how, with minimal computational effort, one can obtain a good initial value for the basis matrix Φ and the exponential model parameters λg , g ∈ s from an FCGMM. As the name suggests the SCGMM can be viewed as an FCGMM, where the full covariance exponential model parameters are constrained to be in a subspace, i.e. θg = λg ⊤ Φ = P D k=1 λ gk φ k , where φ k is the vector corresponding to the kth row of Φ. Section 2 describes how to ﬁnd a good initial basis Φ, Section 3 describes how to initialize λg and Section 4 gives experimental results with the new initialization scheme.