TIME-INHOMOGENEOUS HIDDEN BERNOULLI MODEL: AN ALTERNATIVE TO HIDDEN MARKOV MODEL FOR AUTOMATIC SPEECH RECOGNITION Jahanshah Kabudian 1 , M. Mehdi Homayounpour 1 , S. Mohammad Ahadi 2 1 Department of Computer Engineering, 2 Department of Electrical Engineering, AmirKabir University of Technology (Tehran Polytechnic), Tehran, IRAN. {kabudian, homayoun, sma} at aut.ac.ir ABSTRACT In this paper, a new acoustic model called Time-Inhomogeneous Hidden Bernoulli Model (TI-HBM) is introduced as an alternative to Hidden Markov Model (HMM) in automatic speech recognition. Contrary to HMM, the state transition process in TI- HBM is not a Markov process; rather it is an independent (generalized Bernoulli) process. This difference leads to elimination of dynamic programming at state-level in TI-HBM decoding process. Thus, the computational complexity of TI-HBM for Probability Evaluation and State Estimation is ( ) NL ' (instead of 2 ( ) NL ' in the HMM case). As a new framework for phone duration modeling, TI-HBM is able to model acoustic-unit duration (e.g. phone duration) by using a built-in parameter named survival probability. Similar to the HMM case, three essential problems in TI-HBM have been solved. An EM-algorithm based method has been proposed for training TI-HBM parameters. Experiments in phone recognition for Persian (Farsi) spoken language show that the TI-HBM has some advantages over HMM (e.g. more simplicity and increased speed in recognition phase), and also outperforms HMM in terms of phone recognition accuracy. Index Terms— Time-Inhomogeneous Hidden Bernoulli Model, Hidden Markov Model, Speech Recognition, Acoustic Modeling, Phone Recognition, Phone Duration Modeling, Persian (Farsi) Spoken Language. 1. INTRODUCTION Hidden Markov Model (HMM) is the most popular and the most successful tool for analyzing and modeling stochastic sequences in speech processing [1]. The usual assumption in HMM is that the state transition process is a Markov process, and the generated state sequence obeys a Markov regime. It is experimentally approved that the state transition probabilities have less important roles compared to observation density functions. There is no attempt on relaxing the Markov dependency in acoustic models like HMM. In this paper, a new acoustic model named TI-HBM has been proposed in which the Markov regime in state transition process is relaxed. There are many attempts on phone duration modeling [2,3,4]. The TI-HBM models acoustic-unit duration (e.g. phone duration) by using a built-in parameter named survival probability, which is derived from joint state-time distribution parameters. In the next sections, we introduce TI-HBM and its basic definitions. 2. TI-HBM TI-HBM model is a new acoustic model which is able to simultaneously model both state transition and acoustic-unit (e.g. phone) duration by using a new parameter called Joint State-Time Distribution , (,) ST P it . The parameter (,) Pit is probability of being in state i at time t . Therefore, parameters of TI-HBM are: 1. Joint State-Time Distribution (,) Pit . 2. Parameters of Gaussian mixtures, i.e. im w , im μ and im C . The parameters (,) Pit play roles similar to i Q and ij a in standard HMM. The following constraint must be satisfied: max 1 1 (,) 1 N L i t Pit = = =   (2.1) max (,) 0 for Pit t L = > (2.2) where max L is the maximum length of observation sequence X . We derive some useful parameters from (,) Pit which are needed for employing TI-HBM in real-world: 1. Time Distribution function () T P t or () Pt : The () T P t is probability of being at time t which is computed as follows: 1 () (,) N i Pt Pit = =  (2.3) If we have K observation sequences with length k L for k-th observation sequence, the time distribution function will be computed by relative frequency of observation vectors with time- index t (frame number t ). Therefore, the time distribution function () T P t is empirically computed by the following formula: 1 1 ( ) ˆ () K k k K k k t L Pt L = = b =   1 (2.4) 1 if is TRUE ( ) 0 if is FALSE cond cond cond £ ¦ ¦ = ¤ ¦ ¦ ¥ 1 (2.5) 2. Survival probability | ( 1| ) next curr T T P t t + or ( 1| ) Pt t + : Given that the process is at time t , the ( 1| ) Pt t + is probability of process survival to time 1 t + . In other words, at time t , the process continues to time 1 t + with probability ( 1| ) Pt t + , otherwise it is terminated at time t with probability 1 ( 1| ) Pt t  + . The | ( 1| ) next curr T T P t t + is computed using Bayes formulation as follows: , | ( 1,) ( 1| ) () next curr next curr curr T T T T T P t t P t t P t + + = (2.6) Since sequence length k L is always greater than zero, therefore: | (1 | 0) 1 next curr T T P = (2.7) The TI-HBM will be able to model acoustic-unit duration using survival probabilities. 3. State selection probability given time | (|) ST P i t or (|) Pi t : | (|) ST P i t is probability of selecting state i at time t , and is computed using the following formula: , , | , 1 (,) (,) (|) () (,) ST ST ST N T ST j P it P it P i t P t P jt = = =  (2.8) It can be seen that the state selection and transition process is a generalized Bernoulli process with probabilities | (|) ST P i t . Contrary to standard Bernoulli process which is a binary process 4101 1-4244-1484-9/08/$25.00 ©2008 IEEE ICASSP 2008