TIME-INHOMOGENEOUS HIDDEN BERNOULLI MODEL:
AN ALTERNATIVE TO HIDDEN MARKOV MODEL FOR AUTOMATIC SPEECH RECOGNITION
Jahanshah Kabudian
1
, M. Mehdi Homayounpour
1
, S. Mohammad Ahadi
2
1
Department of Computer Engineering,
2
Department of Electrical Engineering,
AmirKabir University of Technology (Tehran Polytechnic), Tehran, IRAN.
{kabudian, homayoun, sma} at aut.ac.ir
ABSTRACT
In this paper, a new acoustic model called Time-Inhomogeneous
Hidden Bernoulli Model (TI-HBM) is introduced as an alternative
to Hidden Markov Model (HMM) in automatic speech
recognition. Contrary to HMM, the state transition process in TI-
HBM is not a Markov process; rather it is an independent
(generalized Bernoulli) process. This difference leads to
elimination of dynamic programming at state-level in TI-HBM
decoding process. Thus, the computational complexity of TI-HBM
for Probability Evaluation and State Estimation is ( ) NL ' (instead
of
2
( ) NL ' in the HMM case). As a new framework for phone
duration modeling, TI-HBM is able to model acoustic-unit
duration (e.g. phone duration) by using a built-in parameter named
survival probability. Similar to the HMM case, three essential
problems in TI-HBM have been solved. An EM-algorithm based
method has been proposed for training TI-HBM parameters.
Experiments in phone recognition for Persian (Farsi) spoken
language show that the TI-HBM has some advantages over HMM
(e.g. more simplicity and increased speed in recognition phase),
and also outperforms HMM in terms of phone recognition
accuracy.
Index Terms— Time-Inhomogeneous Hidden Bernoulli Model,
Hidden Markov Model, Speech Recognition, Acoustic Modeling,
Phone Recognition, Phone Duration Modeling, Persian (Farsi)
Spoken Language.
1. INTRODUCTION
Hidden Markov Model (HMM) is the most popular and the most
successful tool for analyzing and modeling stochastic sequences in
speech processing [1]. The usual assumption in HMM is that the
state transition process is a Markov process, and the generated
state sequence obeys a Markov regime. It is experimentally
approved that the state transition probabilities have less important
roles compared to observation density functions. There is no
attempt on relaxing the Markov dependency in acoustic models
like HMM. In this paper, a new acoustic model named TI-HBM
has been proposed in which the Markov regime in state transition
process is relaxed. There are many attempts on phone duration
modeling [2,3,4]. The TI-HBM models acoustic-unit duration (e.g.
phone duration) by using a built-in parameter named survival
probability, which is derived from joint state-time distribution
parameters. In the next sections, we introduce TI-HBM and its
basic definitions.
2. TI-HBM
TI-HBM model is a new acoustic model which is able to
simultaneously model both state transition and acoustic-unit (e.g.
phone) duration by using a new parameter called Joint State-Time
Distribution
,
(,)
ST
P it . The parameter (,) Pit is probability of
being in state i at time t . Therefore, parameters of TI-HBM are:
1. Joint State-Time Distribution (,) Pit .
2. Parameters of Gaussian mixtures, i.e.
im
w ,
im
μ and
im
C .
The parameters (,) Pit play roles similar to
i
Q and
ij
a in standard
HMM. The following constraint must be satisfied:
max
1 1
(,) 1
N L
i t
Pit
= =
=
(2.1)
max
(,) 0 for Pit t L = > (2.2)
where
max
L is the maximum length of observation sequence X .
We derive some useful parameters from (,) Pit which are needed
for employing TI-HBM in real-world:
1. Time Distribution function ()
T
P t or () Pt :
The ()
T
P t is probability of being at time t which is computed as
follows:
1
() (,)
N
i
Pt Pit
=
=
(2.3)
If we have K observation sequences with length
k
L for k-th
observation sequence, the time distribution function will be
computed by relative frequency of observation vectors with time-
index t (frame number t ). Therefore, the time distribution
function ()
T
P t is empirically computed by the following formula:
1
1
( )
ˆ
()
K
k
k
K
k
k
t L
Pt
L
=
=
b
=
1
(2.4)
1 if is TRUE
( )
0 if is FALSE
cond
cond
cond
£
¦
¦
= ¤
¦
¦
¥
1 (2.5)
2. Survival probability
|
( 1| )
next curr T T
P t t + or ( 1| ) Pt t + :
Given that the process is at time t , the ( 1| ) Pt t + is probability
of process survival to time 1 t + . In other words, at time t , the
process continues to time 1 t + with probability ( 1| ) Pt t + ,
otherwise it is terminated at time t with probability
1 ( 1| ) Pt t + . The
|
( 1| )
next curr
T T
P t t + is computed using Bayes
formulation as follows:
,
|
( 1,)
( 1| )
()
next curr
next curr
curr
T T
T T
T
P t t
P t t
P t
+
+ = (2.6)
Since sequence length
k
L is always greater than zero, therefore:
|
(1 | 0) 1
next curr T T
P = (2.7)
The TI-HBM will be able to model acoustic-unit duration using
survival probabilities.
3. State selection probability given time
|
(|)
ST
P i t or (|) Pi t :
|
(|)
ST
P i t is probability of selecting state i at time t , and is
computed using the following formula:
, ,
|
,
1
(,) (,)
(|)
()
(,)
ST ST
ST
N
T
ST
j
P it P it
P i t
P t
P jt
=
= =
(2.8)
It can be seen that the state selection and transition process is a
generalized Bernoulli process with probabilities
|
(|)
ST
P i t .
Contrary to standard Bernoulli process which is a binary process
4101 1-4244-1484-9/08/$25.00 ©2008 IEEE ICASSP 2008