Parameter Clustering and Sharing in Variable-Parameter HMMs for Noise
Robust Speech Recognition
Dong Yu, Li Deng, Yifan Gong, Alex Acero
Microsoft Corporation, Redmond, WA, USA
{dongyu, deng, ygong, alexac}@microsoft.com
Abstract
Recently we proposed a cubic-spline-based variable-
parameter hidden Markov model (CS-VPHMM) whose mean
and variance parameters vary according to some cubic spline
functions of additional environment-dependent parameters.
We have shown good properties of the CS-VPHMM and
demonstrated on the Aurora-3 corpus that MCE-trained CS-
VPHMM greatly outperforms the MCE-trained conventional
HMM at the cost of increased total number of model
parameters. In this paper, we propose to share spline functions
across different Gaussian mixture components to reduce the
total number of model parameters and develop a clustering
algorithm to do so. We demonstrate the effectiveness of our
parameter clustering and sharing algorithm for the CS-
VPHMM on Aurora-3 corpus and show that proper parameter
sharing can reduce the number of parameters from 4 times of
that used in the conventional HMM to 1.13 times and still get
18% relative WER reduction over the MCE trained
conventional HMM under the well-matched condition.
Effective parameter sharing makes the CS-VPHMM an
attractive model for noise robustness.
Index Terms: speech recognition, variable-parameter hidden
Markov model, cubic spline, parameter sharing, clustering
1. Introduction
Recently Cui and Gong [2] proposed a new model, named
variable-parameter hidden Markov model (VPHMM), for
robust automatic speech recognition (ASR). In their original
VPHMM model the means and variances of the Gaussian
mixtures change according to a polynomial function of some
environment-dependant conditioning parameters such as
signal-to-noise ratio (SNR).
We further advanced the technique with the cubic-spline-
based VPHMM (CS-VPHMM) [6]. In the CS-VPHMM, the
continuous observation density function
, ,
( , )
i rt rt
b x ζ for state i,
acoustic observation
, rt
x and the conditioning parameter
, rt
ζ
at frame t in the utterance r is
, , , , , ,
1
, , , , , ,
1
( , ) ( , )
| , ,
L
i rt rt il il rt rt
l
L
il rt il rt il rt
l
b wb
wN
x ζ x ζ
x μ ζ Σ ζ
(1)
where L is the number of Gaussian mixture components,
, il
w
is a positive weight for the l-th Gaussian component with the
constraint
,
1,...,
1
il
l L
w
, and
, , , , ,
( | , )
rt il rt il rt
N x μ ζ Σ ζ is
the l-th Gaussian mixture component whose mean and
variance vary based on the conditioning parameter
, rt
ζ . In our
CS-VPHMM, we assume that covariance matrices are
diagonal and each dimension d of the mean and variance
vector can be approximated with a cubic spline
as
(0) (1) ( )
,, ,, ,, ,, ,, ,,
| , ,
K
ild rtd ild rtd ild ild
, (2)
(0) 2 (1) ( )
,, ,, ,, ,, ,, ,,
| , ,
K
ild rtd ild rtd ild ild
, (3)
where
(0)
,, ild
and
(0)
,, ild
are the Gaussian-component-specific
mean and variance,
(1) ( )
,, ,,
, ,
K
ild ild
and
(1) ( )
,, ,,
, ,
K
ild ild
are the spline knots (will be discussed in Section 2) that can
be shared across different Gaussian mixture components, and
,, ild is the regression class so that many different pairs
of ,, ild may be mapped to the same regression class.
In our companion paper [6], we developed the
discriminative training algorithm for the CS-VPHMM defined
by (1), (2) and (3). We showed that the CS-VPHMM can use
the dimension-wise instantaneous SNR as the conditioning
parameter and so is much more flexible and powerful than the
polynomial function based VPHMM proposed by Cui and
Gong [2]. We also demonstrated on the Aurora-3 corpus that
the discriminatively trained CS-VPHMM greatly outperforms
the discriminatively trained conventional HMM both with and
without our recently developed Mel-frequency cepstral
minimum mean square error (MFCC-MMSE) motivated noise
suppressor [5], esp. under the well-matched condition, at the
cost of increased total number of model parameters.
In this paper, we explore the parameter sharing capability
of the CS-VPHMM and answer the question whether it is
possible to reduce the number of parameters in the CS-
VPHMM without losing the gains achieved when no
parameters are shared. We develop and describe a clustering
algorithm to determine how the splines should be tied and
report our experimental results on Aurora-3 corpus. We show
that proper parameter sharing can reduce the number of
parameters from 4 times of that used in the conventional
HMM to 1.13 times and still get 18% relative WER reduction
over the MCE trained conventional HMM under the well-
matched condition. Effective parameter sharing makes the
CS-VPHMM an attractive model for noise robustness.
The rest of the paper is organized as follows. In Section 2,
we review some concepts related to the cubic spline and CS-
VPHMM. In Section 3, we describe the detailed spline
clustering algorithm. In Section 4, we report our experimental
results on Aurora-3 with different degrees of parameter
sharing and demonstrate the effectiveness of the clustering
algorithm. We conclude the paper in Section 4.
2. Cubic Spline and CS-VPHMM
In this section, we briefly review some concepts related to the
cubic spline and CS-VPHMM to set the background. Detailed
information on the CS-VPHMM and the discriminative
training algorithm used to estimate the model parameters can
be found in our companion paper [6].
As mentioned in section 1, the mean and variance of each
Gaussian mixture component in the CS-VPHMM vary
Accepted after peer review of full paper
Copyright © 2008 ISCA
September 22 - 26, Brisbane Australia 1253
10.21437/Interspeech.2008-301