Zero knowledge hidden Markov model inference J.M. Schwier a,1 , R.R. Brooks a, * ,2 , C. Griffin b,3 , S. Bukkapatnam c,4 a Holcombe Department of Electrical and Computer Engineering, Clemson University, P.O. Box 340915, Clemson, SC 29634, United States b Communications, Navigation and Information Office, The Applied Research Laboratory, The Pennsylvania State University, University Park, PA 16804, United States c Oklahoma State University, School of Industrial Engineering and Management, EN 322, Stillwater, OK 74078, United States article info Article history: Received 17 October 2008 Received in revised form 12 May 2009 Available online 27 June 2009 Communicated by O. Siohan Keywords: Pattern recognition Hidden Markov model Pattern discovery abstract Hidden Markov models (HMMs) are widely used in pattern recognition. HMM construction requires an initial model structure that is used as a starting point to estimate the model’s parameters. To construct a HMM without a priori knowledge of the structure, we use an approach developed by Crutchfield and Shalizi that requires only a sequence of observations and a maximum data window size. Values of the maximum data window size that are too small result in incorrect models being constructed. Values that are too large reduce the number of data samples that can be considered and exponentially increase the algorithm’s computational complexity. In this paper, we present a method for automatically inferring this parameter directly from training data as part of the model construction process. We present theoretical and experimental results that confirm the utility of the proposed extension. Ó 2009 Elsevier B.V. All rights reserved. 1. Introduction Hidden Markov models (HMMs) are a common tool in pattern recognition. Applications of HMMs include voice recognition (Rabiner, 1989; Damper and Higgins, 2003), texture recognition (Chen and Kundu, 1994), handwriting recognition (Xue and Gov- indaraju, 2006; Mozaffari et al., 2008), gait recognition (Liu and Sarkar, 2006), tracking (Chen et al., 2006), and human behavior recognition (Yang et al., 1997; Hu et al., 2002). We refer the reader to (He et al., 2008) for a more detailed review of HMM applications and training and classification approaches. Traditionally, the Baum–Welch algorithm is used to infer the state transition matrix of a Markov chain and symbol output prob- abilities associated to the states of the chain, given an initial Mar- kov model and a sequence of symbolic output values (see Rabiner (1989)). The Baum–Welch Algorithm uses dynamic programming to solve a non-linear optimization problem. The fundamental ap- proach of constructing Markov models from data streams has been heavily researched for specific applications. Methods in (Deng and Erler, 1992) and (Deng and Sun, 1994), for example, illustrate con- struction and training in speech recognition applications. To construct a Markov model without a priori structural infor- mation, we use an approach developed by Shalizi and Crutchfield (2001) and Shalizi (2001, 2002), which derives the HMM state structure and transition matrix from available data samples. Other approaches may be used to construct models from data streams, such as (Ostendorf and Singer, 1997), for specific areas such as speech recognition. In this work, we only consider the approach from Shalizi et al. Shalizi’s approach finds statistically significant groupings of the training data that correspond to HMM states. This is done by ana- lyzing the conditional next symbol probabilities for a data window that slides over the training data. This data window increases grad- ually from a size of two to an a priori known maximum window size L. Except for the training data, the only initial information re- quired to construct the HMM model using their approach is the parameter L. The parameter L expresses the maximum number of symbols that are statistically relevant to the next symbol in the se- quence. The state structure of the Markov model is inferred from the symbol groupings of length 6 L by adding those states to the model that lower system entropy (Shalizi et al., 2002; Shalizi and Shalizi, 2004). To date, no one has considered how to dynamically find the parameter L. We extend the work of Crutchfield and Shalizi so that we determine parameter L with no prior knowledge and therefore derive minimum entropy HMMs with no a priori information. The remainder of this paper is organized as follows. Section 2 provides background on hidden Markov models. Section 3 explains Crutchfield’s and Shalizi’s HMM inference algorithm. Our proposed algorithm for identifying L for zero knowledge inference of HMMs 0167-8655/$ - see front matter Ó 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2009.06.008 * Corresponding author. Tel.: +1 864 656 0921; fax: +1 864 656 5910. E-mail addresses: js7@clemson.edu (J.M. Schwier), rrb@acm.org (R.R. Brooks), griffinch@ieee.org (C. Griffin), satish.t.bukkapatnam@okstate.edu (S. Bukkapatnam). 1 Jason M. Schwier is a Ph.D. student at Clemson University. 2 Richard R. Brooks is an Associate Professor at Clemson University. 3 Christopher Griffin is a Research Associate at the Applied Research Laboratory at the Pennsylvania State University. 4 Satish Bukkapatnam is an Associate Professor at Oklahoma State University. Pattern Recognition Letters 30 (2009) 1273–1280 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec