1484 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 6, DECEMBER 2009
Behavior Detection Using Confidence Intervals of
Hidden Markov Models
Richard R. Brooks, Senior Member, IEEE, Jason M. Schwier, and Christopher Griffin, Member, IEEE
Abstract—Markov models are commonly used to analyze
real-world problems. Their combination of discrete states and
stochastic transitions is suited to applications with deterministic
and stochastic components. Hidden Markov models (HMMs) are
a class of Markov models commonly used in pattern recognition.
Currently, HMMs recognize patterns using a maximum-likelihood
approach. One major drawback with this approach is that data
observations are mapped to HMMs without considering the num-
ber of data samples available. Another problem is that this ap-
proach is only useful for choosing between HMMs. It does not
provide a criterion for determining whether or not a given HMM
adequately matches the data stream. In this paper, we recognize
complex behaviors using HMMs and confidence intervals. The
certainty of a data match increases with the number of data
samples considered. Receiver operating characteristic curves are
used to find the optimal threshold for either accepting or rejecting
an HMM description. We present one example using a family of
HMMs to show the utility of the proposed approach. A second
example using models extracted from a database of consumer
purchases provides additional evidence that this approach can
perform better than existing techniques.
Index Terms—Confidence intervals, forward–backward proce-
dure, hidden Markov models (HMMs), receiver operating charac-
teristic (ROC) analysis.
I. I NTRODUCTION
H
IDDEN Markov models (HMMs) are extensively used
for pattern recognition applications, such as handwriting
recognition [1], [2], speech recognition [3], and gait recognition
[4]. HMMs are appropriate models of systems with determinis-
tic and stochastic components.
In this paper, we solve the problem of detecting a behavior in
a sensor data stream using HMMs. Traditionally, HMMs are
used for data classification, which assigns the observed data
stream to one of a known set of models. This is most commonly
done using a maximum-likelihood approach [5].
Although detection and classification are similar problems in
many respects, we note that detection is subtly different from
classification. By definition, classification always returns one
Manuscript received August 7, 2008; revised December 16, 2008. First
published May 2, 2009; current version published November 18, 2009. This
work was supported in part by the Office of Naval Research (Code 311) under
Contract N00014-06-C-0022. The work of C. Griffin was supported by the
Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U.S.
Department of Energy under Contract DE-AC05-00OR22725. This paper was
recommended by Associate Editor V. Murino.
R. R. Brooks and J. M. Schwier are with The Holcombe Department of
Electrical and Computer Engineering, Clemson University, Clemson, SC 29634
USA (e-mail: rrb@acm.org; jschwie@clemson.edu).
C. Griffin is with the Applied Research Laboratory, The Pennsylvania State
University, University Park, PA 16802 USA (e-mail: griffinch@ieee.org).
Digital Object Identifier 10.1109/TSMCB.2009.2019732
(and exactly one) model that matches the data stream. Detection
may find that no model matches the data stream. It may also
return more than one model.
The primary innovation in this paper is using confidence
intervals for HMM analysis. This has the advantage of being
able to consider the number of data samples available when
comparing an HMM model with a sensor data stream. Our use
of receiver operating characteristic (ROC) curves to find detec-
tion thresholds when confidence intervals are used is also novel.
The outline of this paper is given as follows: Section II pro-
vides a background on Markov models and the commonly used
maximum-likelihood approach. We discuss issues with match-
ing sequences and Markov models. We explain how to calculate
confidence intervals in Section III. In Section IV, we use an
illustrative example to explain the testing procedure and pro-
vide results for our confidence interval approach. We contrast
these results with the performance of the maximum-likelihood
approach. Section V shows the performance of the confidence
interval approach using consumer activity data. A summary
with suggestions for future work is given in Section VI.
II. BACKGROUND I NFORMATION
A. Markov Models
A Markov model is a tuple λ =(V,E,L,φ), where V is a
set of vertices of a graph, E is a set of directed edges between
the vertices, L is a set of labels, and φ : L → E is a labeling of
the edges. A path through λ with label χ = χ
1
,χ
2
,...,χ
n
is
an ordered set of vertices (v
1
,v
2
,...,v
n+1
) such that for each
pair of vertices (v
i
,v
j
):
1) (v
i
,v
j
) ∈ E;
2) φ(v
i
,v
j
)= χ
i
.
In Markov models, the vertices of λ are referred to as states,
and the edges are referred to as transitions, where V is the
state space of size n, and P is the n × n transition matrix.
Each element p
i,j
∈ P expresses the probability the process
transitions to state j once it is in state i. If (v
i
,v
j
) ∈ E, then we
assume p
i,j
=0 and for any i,
∑
j
p
i,j
=1. The fundamental
property of Markov models is that they are “memoryless.”
The conditional probability of a transition to a new state only
depends on the current state, not on the path taken to reach the
current state.
We require that Markov models be deterministic in the tran-
sition label, i.e., if there is a pair (v
i
, v
j
) with φ(v
i
,v
j
)= χ
i
,
then φ(v
i
,v
k
) = χ
i
∀k = j . Therefore, the probability that a
particular symbol will occur next is the probability p
i,j
of
1083-4419/$25.00 © 2009 IEEE