MODEL ASSESSMENT WITH KOLMOGOROV–SMIRNOV STATISTICS
Petar M. Djuri´ c
(1)
and Joaqu´ ın M´ ıguez
(2)
(1)
Department of Electrical and Computer Engineering
Stony Brook University, Stony Brook, NY 11794, USA
(2)
Departamento de Teor´ ıa de la Se ˜ nal y Comunicaciones
Universidad Carlos III de Madrid
Avda. de la Universidad 30, Legan´ es, 28911 Madrid, Spain
e-mail: djuric@ece.sunysb.edu, joaquin.miguez@uc3m.es
ABSTRACT
One of the most basic problems in science and engineering is the
assessment of a considered model. The model should describe a set
of observed data and the objective is to find ways of deciding if the
model should be rejected. It seems that this is an ill-conditioned
problem because we have to test the model against all the possible
alternative models. In this paper we use the Kolmogorov–Smirnov
statistic to develop a test that shows if the model should be kept or it
should be rejected. We explain how this testing can be implemented
in the context of particle filtering. We demonstrate the performance
of the proposed method by computer simulations.
Index Terms— Model assessment, particle filtering,
Kolmogorov-Smirnov statistics
1. INTRODUCTION
The power of science has been recognized by the ability of the scien-
tific method to predict the future accurately and in a consistent way.
Often the accuracy is quantified by the discrepancy between future
observations (once observed) and sets of predicted observations. In
a general setting, a model M is used to predict future observations
and one way of producing them is by employing the predictive distri-
bution of the data conditioned on the model. We write the predictive
distribution of the set of observations y1:T ≡{y1,y2 ··· ,yT } con-
ditioned on M as p(y1:T |M), where
p(y1:T |M) = p(y1 |M)
T
t=1
p(yt+1 | y1:t , M) (1)
with the factors in (1), p(yt+1 | y1:t , M), being predictive distribu-
tions themselves. At time instant t, yt+1 is a future observation mod-
eled by M, and y1:t ≡{y1,y2, ..., yt } is the set of known observa-
tions.
The observations are our physical reality and are often the
only ingredient that we have when we deal with the uncertainty
of considered model(s). When we have more than one competing
This work has been supported by the National Science Foundation
under Award CCF-0515246 and the Office of Naval Research under Award
N00014-06-1-0012. The work has been carried out while the first author
held the Chair of Excellence of Universidad Carlos III de Madrid-Banco de
Santander.
model for the observed data, we usually want to find the best of
these models. This is known in the literature as the model selection
problem [1]. From a Bayesian perspective, the best model is
typically the model that has the maximum a posteriori probability,
p(M
k
| y1:T ), where M
k
signifies the k−th considered model
and where y1:T is the set of data used in computing the posterior
probability of M
k
. One can show that by using this criterion, one
balances the goodness of fit and the complexity of the model. The
implementation of the model selection is a well studied problem,
and the literature on the subject is quite large. We point out that in
this paper we are interested in the class of dynamic models which
are nonlinear and which may contain non-Gaussian noises. Then
the model selection may not be a trivial task. However, since for
nonlinear dynamic models particle filtering is often the method of
choice, it is useful to have approaches for model selection within the
context of particle filtering. It can be shown that model selection in
that case can be accomplished by following a well established theory
(for example, see [2]).
In this paper, by contrast, we deal with a scenario where we
have only one model, and we want to make a decision whether to
keep the model or reject it. Clearly, any meaningful analysis of
data requires the possibility of excluding the used model if it fails
to provide satisfactory description of the data [3]. The problem of
evaluating a single model is not an easy one because it seems that
it is ill posed in the sense that we have to test a model M against
unstated alternatives. If there is a true model denoted by M0, we
have to test the hypothesis
H : M = M0. (2)
In [1] this formulation of the problem is considered “rather too
general to develop further in any detail.” The difficulty of this “ill
defined problem of model rejection” is alleviated by specifying a
large set of alternative models parameterized by some conveniently
chosen set of parameters where the model M0 is some form of
parametric restriction of a more general class of models denoted by
M1. The problem then becomes one of model selection.
Here we propose a method that is truly a method for model
assessment that does not require defining alternative models. We
anchor the procedure around the made observations and the model-
based predicted observations. As in model selection the key role in
the assessment is played by the predictive distribution of the data
conditioned on the assessed model. Under certain mild assumptions,
2973 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009