1484 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 Behavior Detection Using Conﬁdence Intervals of Hidden Markov Models Richard R. Brooks, Senior Member, IEEE, Jason M. Schwier, and Christopher Grifﬁn, Member, IEEE Abstract—Markov models are commonly used to analyze real-world problems. Their combination of discrete states and stochastic transitions is suited to applications with deterministic and stochastic components. Hidden Markov models (HMMs) are a class of Markov models commonly used in pattern recognition. Currently, HMMs recognize patterns using a maximum-likelihood approach. One major drawback with this approach is that data observations are mapped to HMMs without considering the num- ber of data samples available. Another problem is that this ap- proach is only useful for choosing between HMMs. It does not provide a criterion for determining whether or not a given HMM adequately matches the data stream. In this paper, we recognize complex behaviors using HMMs and conﬁdence intervals. The certainty of a data match increases with the number of data samples considered. Receiver operating characteristic curves are used to ﬁnd the optimal threshold for either accepting or rejecting an HMM description. We present one example using a family of HMMs to show the utility of the proposed approach. A second example using models extracted from a database of consumer purchases provides additional evidence that this approach can perform better than existing techniques. Index Terms—Conﬁdence intervals, forward–backward proce- dure, hidden Markov models (HMMs), receiver operating charac- teristic (ROC) analysis. I. I NTRODUCTION H IDDEN Markov models (HMMs) are extensively used for pattern recognition applications, such as handwriting recognition [1], [2], speech recognition [3], and gait recognition [4]. HMMs are appropriate models of systems with determinis- tic and stochastic components. In this paper, we solve the problem of detecting a behavior in a sensor data stream using HMMs. Traditionally, HMMs are used for data classiﬁcation, which assigns the observed data stream to one of a known set of models. This is most commonly done using a maximum-likelihood approach [5]. Although detection and classiﬁcation are similar problems in many respects, we note that detection is subtly different from classiﬁcation. By deﬁnition, classiﬁcation always returns one Manuscript received August 7, 2008; revised December 16, 2008. First published May 2, 2009; current version published November 18, 2009. This work was supported in part by the Ofﬁce of Naval Research (Code 311) under Contract N00014-06-C-0022. The work of C. Grifﬁn was supported by the Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725. This paper was recommended by Associate Editor V. Murino. R. R. Brooks and J. M. Schwier are with The Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 USA (e-mail: rrb@acm.org; jschwie@clemson.edu). C. Grifﬁn is with the Applied Research Laboratory, The Pennsylvania State University, University Park, PA 16802 USA (e-mail: grifﬁnch@ieee.org). Digital Object Identiﬁer 10.1109/TSMCB.2009.2019732 (and exactly one) model that matches the data stream. Detection may ﬁnd that no model matches the data stream. It may also return more than one model. The primary innovation in this paper is using conﬁdence intervals for HMM analysis. This has the advantage of being able to consider the number of data samples available when comparing an HMM model with a sensor data stream. Our use of receiver operating characteristic (ROC) curves to ﬁnd detec- tion thresholds when conﬁdence intervals are used is also novel. The outline of this paper is given as follows: Section II pro- vides a background on Markov models and the commonly used maximum-likelihood approach. We discuss issues with match- ing sequences and Markov models. We explain how to calculate conﬁdence intervals in Section III. In Section IV, we use an illustrative example to explain the testing procedure and pro- vide results for our conﬁdence interval approach. We contrast these results with the performance of the maximum-likelihood approach. Section V shows the performance of the conﬁdence interval approach using consumer activity data. A summary with suggestions for future work is given in Section VI. II. BACKGROUND I NFORMATION A. Markov Models A Markov model is a tuple λ =(V,E,L,φ), where V is a set of vertices of a graph, E is a set of directed edges between the vertices, L is a set of labels, and φ : L → E is a labeling of the edges. A path through λ with label χ = χ 1 ,χ 2 ,...,χ n is an ordered set of vertices (v 1 ,v 2 ,...,v n+1 ) such that for each pair of vertices (v i ,v j ): 1) (v i ,v j ) ∈ E; 2) φ(v i ,v j )= χ i . In Markov models, the vertices of λ are referred to as states, and the edges are referred to as transitions, where V is the state space of size n, and P is the n × n transition matrix. Each element p i,j ∈ P expresses the probability the process transitions to state j once it is in state i. If (v i ,v j ) ∈ E, then we assume p i,j =0 and for any i, ∑ j p i,j =1. The fundamental property of Markov models is that they are “memoryless.” The conditional probability of a transition to a new state only depends on the current state, not on the path taken to reach the current state. We require that Markov models be deterministic in the tran- sition label, i.e., if there is a pair (v i , v j ) with φ(v i ,v j )= χ i , then φ(v i ,v k ) = χ i ∀k = j . Therefore, the probability that a particular symbol will occur next is the probability p i,j of 1083-4419/$25.00 © 2009 IEEE