Parameter Clustering and Sharing in Variable-Parameter HMMs for Noise Robust Speech Recognition Dong Yu, Li Deng, Yifan Gong, Alex Acero Microsoft Corporation, Redmond, WA, USA {dongyu, deng, ygong, alexac}@microsoft.com Abstract Recently we proposed a cubic-spline-based variable- parameter hidden Markov model (CS-VPHMM) whose mean and variance parameters vary according to some cubic spline functions of additional environment-dependent parameters. We have shown good properties of the CS-VPHMM and demonstrated on the Aurora-3 corpus that MCE-trained CS- VPHMM greatly outperforms the MCE-trained conventional HMM at the cost of increased total number of model parameters. In this paper, we propose to share spline functions across different Gaussian mixture components to reduce the total number of model parameters and develop a clustering algorithm to do so. We demonstrate the effectiveness of our parameter clustering and sharing algorithm for the CS- VPHMM on Aurora-3 corpus and show that proper parameter sharing can reduce the number of parameters from 4 times of that used in the conventional HMM to 1.13 times and still get 18% relative WER reduction over the MCE trained conventional HMM under the well-matched condition. Effective parameter sharing makes the CS-VPHMM an attractive model for noise robustness. Index Terms: speech recognition, variable-parameter hidden Markov model, cubic spline, parameter sharing, clustering 1. Introduction Recently Cui and Gong [2] proposed a new model, named variable-parameter hidden Markov model (VPHMM), for robust automatic speech recognition (ASR). In their original VPHMM model the means and variances of the Gaussian mixtures change according to a polynomial function of some environment-dependant conditioning parameters such as signal-to-noise ratio (SNR). We further advanced the technique with the cubic-spline- based VPHMM (CS-VPHMM) [6]. In the CS-VPHMM, the continuous observation density function , , ( , ) i rt rt b x ζ for state i, acoustic observation , rt x and the conditioning parameter , rt ζ at frame t in the utterance r is       , , , , , , 1 , , , , , , 1 ( , ) ( , ) | , , L i rt rt il il rt rt l L il rt il rt il rt l b wb wN       x ζ x ζ x μ ζ Σ ζ (1) where L is the number of Gaussian mixture components, , il w is a positive weight for the l-th Gaussian component with the constraint , 1,..., 1 il l L w    , and     , , , , , ( | , ) rt il rt il rt N x μ ζ Σ ζ is the l-th Gaussian mixture component whose mean and variance vary based on the conditioning parameter , rt ζ . In our CS-VPHMM, we assume that covariance matrices are diagonal and each dimension d of the mean and variance vector can be approximated with a cubic spline  as         (0) (1) ( ) ,, ,, ,, ,, ,, ,, | , , K ild rtd ild rtd ild ild            , (2)         (0) 2 (1) ( ) ,, ,, ,, ,, ,, ,, | , , K ild rtd ild rtd ild ild            , (3) where (0) ,, ild  and (0) ,, ild  are the Gaussian-component-specific mean and variance,     (1) ( ) ,, ,, , , K ild ild      and (1) ( ) ,, ,, , , K ild ild                  are the spline knots (will be discussed in Section 2) that can be shared across different Gaussian mixture components, and   ,, ild  is the regression class so that many different pairs of   ,, ild may be mapped to the same regression class. In our companion paper [6], we developed the discriminative training algorithm for the CS-VPHMM defined by (1), (2) and (3). We showed that the CS-VPHMM can use the dimension-wise instantaneous SNR as the conditioning parameter and so is much more flexible and powerful than the polynomial function based VPHMM proposed by Cui and Gong [2]. We also demonstrated on the Aurora-3 corpus that the discriminatively trained CS-VPHMM greatly outperforms the discriminatively trained conventional HMM both with and without our recently developed Mel-frequency cepstral minimum mean square error (MFCC-MMSE) motivated noise suppressor [5], esp. under the well-matched condition, at the cost of increased total number of model parameters. In this paper, we explore the parameter sharing capability of the CS-VPHMM and answer the question whether it is possible to reduce the number of parameters in the CS- VPHMM without losing the gains achieved when no parameters are shared. We develop and describe a clustering algorithm to determine how the splines should be tied and report our experimental results on Aurora-3 corpus. We show that proper parameter sharing can reduce the number of parameters from 4 times of that used in the conventional HMM to 1.13 times and still get 18% relative WER reduction over the MCE trained conventional HMM under the well- matched condition. Effective parameter sharing makes the CS-VPHMM an attractive model for noise robustness. The rest of the paper is organized as follows. In Section 2, we review some concepts related to the cubic spline and CS- VPHMM. In Section 3, we describe the detailed spline clustering algorithm. In Section 4, we report our experimental results on Aurora-3 with different degrees of parameter sharing and demonstrate the effectiveness of the clustering algorithm. We conclude the paper in Section 4. 2. Cubic Spline and CS-VPHMM In this section, we briefly review some concepts related to the cubic spline and CS-VPHMM to set the background. Detailed information on the CS-VPHMM and the discriminative training algorithm used to estimate the model parameters can be found in our companion paper [6]. As mentioned in section 1, the mean and variance of each Gaussian mixture component in the CS-VPHMM vary Accepted after peer review of full paper Copyright © 2008 ISCA September 22 - 26, Brisbane Australia 1253 10.21437/Interspeech.2008-301