1484 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009 Behavior Detection Using Confidence Intervals of Hidden Markov Models Richard R. Brooks, Senior Member, IEEE, Jason M. Schwier, and Christopher Griffin, Member, IEEE Abstract—Markov models are commonly used to analyze real-world problems. Their combination of discrete states and stochastic transitions is suited to applications with deterministic and stochastic components. Hidden Markov models (HMMs) are a class of Markov models commonly used in pattern recognition. Currently, HMMs recognize patterns using a maximum-likelihood approach. One major drawback with this approach is that data observations are mapped to HMMs without considering the num- ber of data samples available. Another problem is that this ap- proach is only useful for choosing between HMMs. It does not provide a criterion for determining whether or not a given HMM adequately matches the data stream. In this paper, we recognize complex behaviors using HMMs and confidence intervals. The certainty of a data match increases with the number of data samples considered. Receiver operating characteristic curves are used to find the optimal threshold for either accepting or rejecting an HMM description. We present one example using a family of HMMs to show the utility of the proposed approach. A second example using models extracted from a database of consumer purchases provides additional evidence that this approach can perform better than existing techniques. Index Terms—Confidence intervals, forward–backward proce- dure, hidden Markov models (HMMs), receiver operating charac- teristic (ROC) analysis. I. I NTRODUCTION H IDDEN Markov models (HMMs) are extensively used for pattern recognition applications, such as handwriting recognition [1], [2], speech recognition [3], and gait recognition [4]. HMMs are appropriate models of systems with determinis- tic and stochastic components. In this paper, we solve the problem of detecting a behavior in a sensor data stream using HMMs. Traditionally, HMMs are used for data classification, which assigns the observed data stream to one of a known set of models. This is most commonly done using a maximum-likelihood approach [5]. Although detection and classification are similar problems in many respects, we note that detection is subtly different from classification. By definition, classification always returns one Manuscript received August 7, 2008; revised December 16, 2008. First published May 2, 2009; current version published November 18, 2009. This work was supported in part by the Office of Naval Research (Code 311) under Contract N00014-06-C-0022. The work of C. Griffin was supported by the Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S. Department of Energy under Contract DE-AC05-00OR22725. This paper was recommended by Associate Editor V. Murino. R. R. Brooks and J. M. Schwier are with The Holcombe Department of Electrical and Computer Engineering, Clemson University, Clemson, SC 29634 USA (e-mail: rrb@acm.org; jschwie@clemson.edu). C. Griffin is with the Applied Research Laboratory, The Pennsylvania State University, University Park, PA 16802 USA (e-mail: griffinch@ieee.org). Digital Object Identifier 10.1109/TSMCB.2009.2019732 (and exactly one) model that matches the data stream. Detection may find that no model matches the data stream. It may also return more than one model. The primary innovation in this paper is using confidence intervals for HMM analysis. This has the advantage of being able to consider the number of data samples available when comparing an HMM model with a sensor data stream. Our use of receiver operating characteristic (ROC) curves to find detec- tion thresholds when confidence intervals are used is also novel. The outline of this paper is given as follows: Section II pro- vides a background on Markov models and the commonly used maximum-likelihood approach. We discuss issues with match- ing sequences and Markov models. We explain how to calculate confidence intervals in Section III. In Section IV, we use an illustrative example to explain the testing procedure and pro- vide results for our confidence interval approach. We contrast these results with the performance of the maximum-likelihood approach. Section V shows the performance of the confidence interval approach using consumer activity data. A summary with suggestions for future work is given in Section VI. II. BACKGROUND I NFORMATION A. Markov Models A Markov model is a tuple λ =(V,E,L,φ), where V is a set of vertices of a graph, E is a set of directed edges between the vertices, L is a set of labels, and φ : L E is a labeling of the edges. A path through λ with label χ = χ 1 2 ,...,χ n is an ordered set of vertices (v 1 ,v 2 ,...,v n+1 ) such that for each pair of vertices (v i ,v j ): 1) (v i ,v j ) E; 2) φ(v i ,v j )= χ i . In Markov models, the vertices of λ are referred to as states, and the edges are referred to as transitions, where V is the state space of size n, and P is the n × n transition matrix. Each element p i,j P expresses the probability the process transitions to state j once it is in state i. If (v i ,v j ) E, then we assume p i,j =0 and for any i, j p i,j =1. The fundamental property of Markov models is that they are “memoryless.” The conditional probability of a transition to a new state only depends on the current state, not on the path taken to reach the current state. We require that Markov models be deterministic in the tran- sition label, i.e., if there is a pair (v i , v j ) with φ(v i ,v j )= χ i , then φ(v i ,v k ) = χ i k = j . Therefore, the probability that a particular symbol will occur next is the probability p i,j of 1083-4419/$25.00 © 2009 IEEE