A stochastic version of Expectation Maximization algorithm for better estimation of Hidden Markov Model Shamsul Huda a, * , John Yearwood a , Roberto Togneri b a Center for Informatics and Applied Optimization, School of Information Technology and Mathematical Science (ITMS), University of Ballarat, VIC 3353, Victoria, Australia b Center for Intelligent Information Processing, School of Electrical, Electronic and Computer Engineering, University of Western Australia, WA, Australia article info Article history: Received 28 November 2008 Received in revised form 18 May 2009 Available online 23 June 2009 Communicated by R.C. Guido Keywords: Hidden Markov Model Expectation Maximization Speech recognition Constraint-based Evolutionary Algorithm Stochastic EM abstract This paper attempts to overcome the local convergence problem of the Expectation Maximization (EM) based training of the Hidden Markov Model (HMM) in speech recognition. We propose a hybrid algo- rithm, Simulated Annealing Stochastic version of EM (SASEM), combining Simulated Annealing with EM that reformulates the HMM estimation process using a stochastic step between the EM steps and the SA. The stochastic processes of SASEM inside EM can prevent EM from converging to a local maximum and ﬁnd improved estimation for HMM using the global convergence properties of SA. Experiments on the TIMIT speech corpus show that SASEM obtains higher recognition accuracies than the EM. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction The Hidden Markov Model (HMM) is the most successful statis- tical modeling technique widely used for signal classiﬁcation, Automatic Speech Recognition (ASR) (Rabiner, 1989; Levinson et al., 1983) and time series classiﬁcation. This is because HMM has a powerful ability to model the temporal nature of signals sta- tistically as well as has the ability to represent arbitrarily complex probability density functions of the underlying systems. In a Bayes- ian classiﬁcation scenario (for signal classiﬁcation or recognition), the HMM provides the posterior probability of the signal given its class label/phoneme label (where a phoneme is a basic theoret- ical unit of speech sound for speech signal) in signal classiﬁcation. Therefore, success of the recognition/classiﬁcation of a signal de- pends heavily on how precisely the estimated HMM can represent the underlying phoneme/signal classes in the training data. The standard method of estimating the parameters of HMM is the Expectation Maximization (EM) (Rabiner, 1989; Levinson et al., 1983; Dempster et al., 1977) algorithm. The EM (Rabiner, 1989; Levinson et al., 1983; Dempster et al., 1977) algorithm is attractive and used for estimation of the HMM as well as for esti- mation of many other probabilistic models such as the Finite Mix- ture Models (FMM) McLachlan and Basford, 1988 and the Gaussian Mixture Models (GMM) McLachlan and Basford, 1988 because EM is computationally efﬁcient and can approximate the underlying distribution from the set of observed data which has missing or hidden components (Bilmes, 1998). Unfortunately, the estimation of HMM parameters computed by the EM approach is not always the best (Rabiner, 1989; Levinson et al., 1983). The reason is that the EM algorithm is strongly dependent on the selection of the ini- tial values of model parameters and increases the values for likeli- hood function at each iteration which guarantees to produce a local rather than a global maximum of the likelihood function (Rabiner, 1989; Levinson et al., 1983; Wu, 1983). This gives a non-optimized estimation of the parameters of HMM and conse- quently lowers the recognition accuracy in ASR systems. To circumvent the local convergence problem of EM, recently several investigators have applied hybrid algorithms using Evolu- tionary Algorithm (EA) in combination with EM for optimal esti- mation of a Gaussian Mixture Model (GMM) in a non-linear classiﬁcation problem and unsupervised clustering. These hybrid algorithms in (Martinez and Vitria, 2000; Pernkopf et al., 2005; Martinez and Vitria, 2001; Majdi-Nasab et al., 2006) ignore the constraints of the GMM and assume equal mixture weights which may fail in many practical situations where the mixture weights of individual mixtures of GMM are not the same. Therefore, these algorithms (Martinez and Vitria, 2000; Pernkopf et al., 2005; Martinez and Vitria, 2001; Majdi-Nasab et al., 2006) cannot be 0167-8655/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2009.06.006 * Corresponding author. Tel.: +61 0431271690; fax: +61 3 5327 9289. E-mail addresses: shuda9203@yahoo.com, shuda@ballarat.edu.au (S. Huda), j.yearwood@ballarat.edu.au (J. Yearwood), roberto@ee.uwa.edu.au (R. Togneri). Pattern Recognition Letters 30 (2009) 1301–1309 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec