Pedestrian Association and Localization in Monocular FIR Video Sequence
Mayank Bansal, Shunguang Wu, Jayan Eledath
Sarnoff Corporation
201 Washington Rd, Princeton, NJ, USA
{mbansal,swu,jeledath}@sarnoff.com
Abstract
This paper addresses the frame-to-frame data associ-
ation and state estimation problems in localization of a
pedestrian relative to a moving vehicle from a monocular
far infra-red video sequence. Using a novel application
of the hierarchical model-based motion estimation frame-
work, we are able to use the image appearance informa-
tion to solve the frame-to-frame data association problem
and estimate a sub-pixel accurate height ratio for a pedes-
trian in two frames. Then, to localize the pedestrian, we
propose a novel approach of using the pedestrian height
ratio estimates to guide an interacting multiple-hypothesis-
mode/height filtering algorithm instead of using a constant
pedestrian height model. Experiments on several IR se-
quences demonstrate that this approach achieves results
comparable to those from a known pedestrian height thus
avoiding errors from a constant height model based ap-
proach.
1. Introduction
In recent years, there has been an increased use of
visual sensors in automotive safety and convenience ap-
plications. One important safety application is to detect
pedestrians[17] at night time. Visible-range cameras do
not provide sufficient contrast to detect pedestrians well -
a problem which is well handled by near and far infra-red
(NIR,FIR) cameras. FIR cameras carry the advantage of
target heat sensitivity without the need for active ambient
illumination. The images of vehicles, pedestrians and ani-
mals are significantly enhanced and are clearly visible under
otherwise poor visibility conditions. Accurately estimating
the 3D location of the pedestrian relative to the moving ve-
hicle is important for accurate warnings. This is a chal-
lenging problem as the system has to rely on the temporal
tracking to estimate the location - both frame-to-frame data
association as well as state-estimation filtering become im-
portant. In this paper, we will focus on the data-association
and state-estimation aspects.
In FIR imagery, the appearance of a pedestrian does not
change much from frame-to-frame and it becomes possible
to match a pedestrian across time. This temporal image-
based matching approach helps the tracker by a) reducing
the state-space and hence the complexity of the filter re-
quired by not requiring an appearance model to be main-
tained by the filter, b) providing an alternate more robust
means for data-association in case of missed-detections and
c) explicitly estimating a sub-pixel object size ratio (which
we call scale) in the image between two frames. In this
paper, we describe an application of the hierarchical model-
based motion estimation paradigm of [4] to match pedes-
trian appearance over time without explicitly modeling the
pedestrian shape. The appearance matching is used, first,
to resolve the frame-to-frame association of the detections
and then, to estimate the scale across time which allows
a multiple-hypothesis-mode filtering algorithm to be em-
ployed for the state estimate phase.
To obtain a more accurate 3D localization, instead of us-
ing a constant H (one mode) for all pedestrians, this pa-
per presents a multiple-hypothesis-mode filtering algorithm
where each mode assumes a potential discrete height value
for the pedestrian and runs as a separate filter. The proba-
bility of each filter is obtained by evaluating the likelihood
value of an estimated pedestrian scale relative to the mea-
sured scale from the appearance matcher. The final pedes-
trian location can be obtained either by combining the mode
estimations together or just choosing the one with the high-
est likelihood value.
Related Work. Gandhi et al.[8] have given a compre-
hensive survey of recent research on pedestrian collision
avoidance systems. The paper reviews various approaches
1
38 978-1-4244-3993-5/09/$25.00 ©2009 IEEE