GEOMETRY-BASED ESTIMATION OF OCCLUSIONS FROM VIDEO FRAME PAIRS
Serdar Ince and Janusz Konrad
Department of Electrical and Computer Engineering, Boston University
8 Saint Mary’s St., Boston, MA 02215
ABSTRACT
The knowledge of occlusions and newly-exposed areas, a natural
consequence of changing object juxtaposition in a 3-D scene, can
be effectively used to improve video coding efficiency, video rate
conversion quality and view interpolation fidelity. Although var-
ious occlusion estimation methods have been proposed to date,
most of them are not robust or are computationally complex. In
this paper, we study two simple, well-known occlusion estima-
tion methods, one based on a photometric mismatch between two
frames of an image sequence, while the other based on a geomet-
ric mismatch. We demonstrate their weaknesses and propose a
new geometric method that exhibits good robustness to noise in
the data while maintaining low computational complexity.
1. INTRODUCTION
Occlusion effects occurring in image sequences are a natural con-
sequence of changing object juxtaposition in a 3-D scene. These
effects result in parts of an image frame disappearing in the fol-
lowing frame(s), known as occlusion areas, or appearing in the
following frame(s), known as newly-exposed areas. Both types
of areas play a very important role in motion estimation from dy-
namic imagery and in disparity estimation from stereo or multi-
view imagery; for frame points in occlusion areas forward motion
is undefined (those points disappear in the next frame). Similarly,
for frame points in newly-exposed areas backward motion is not
defined. Consequently, motion parameters should not be com-
puted for image points belonging to either type of area, as they
are meaningless. However, and this is the second observation,
most motion estimation algorithms employ some form of regu-
larization (explicit motion smoothness prior, block-based motion
model, intensity matching over a window, etc.). Since in occlusion
and newly-exposed areas motion parameters are undefined, regu-
larization should be disallowed between image points from those
areas and neighboring points with well-defined motion. In order to
achieve this, occlusion and newly-exposed areas must be explicitly
known. Similar observations apply to disparity estimation.
Estimation of occlusion and newly-exposed areas is an inverse
problem and, as such, is ill-posed. Most of occlusion/ newly-
exposed area estimation methods rely on 3 or more frames to make
decision about individual image points [1, 2, 3, 4, 5]. Such meth-
ods compare intensity consistency between the current and previ-
ous frame(s) with that between the current and future frame(s). In
general, this improves reliability of occlusion estimation but re-
quires larger buffers and is more complex computationally. Meth-
ods have been proposed that estimate newly exposed areas from
This work was supported by the National Science Foundation under
grant ECS-0219224. E-mail: {ince,jkonrad}@bu.edu
two image frames only. However, such methods based on photo-
metric detection mechanism (intensity mismatch) [6, 7] are not re-
liable, while those based on geometric mechanism (motion vector
mismatch) [8] although more reliable under high PSNR conditions
still fail on noisy data.
In this paper, we propose a simple method for the detection
of occlusion and newly-exposed areas that is based on geometric
properties of the motion field. The method is applicable to any
motion field derived from an image pair. Its principle is based on
the observation that the regular grid in the reference image plane,
at which the motion vectors are anchored, forms an irregular grid
in the target image plane after motion compensation. Since the
target image will contain no motion-compensated projections in
the newly-exposed areas, such areas can be easily detected. We
present a simple neighborhood test to detect newly-exposed pix-
els and we compare our approach with standard photometry- and
geometry-based methods.
2. PHOTOMETRY-BASED ESTIMATION OF
OCCLUSIONS
The usual assumption in estimation of occlusions from two frames,
is excessive intensity matching (motion-compensated prediction)
error observed; reference-frame pixels that disappear cannot be
accurately matched in the target frame and thus induce significant
errors. Let I
1[x] denote intensity of the first frame of a sequence
at spatial position x, and I2[x] – similar intensity in the second
frame. If d
f
denotes a forward motion (disparity) field anchored
on the sampling grid of frame #1 (reference) and pointing to the
target frame #2, while d
b
denotes a backward motion field, then
the corresponding motion-compensated prediction errors at x are:
ε
f
[x] = I1[x] − I2[x + d
f
[x]],
ε
b
[x] = I2[x] − I1[x − d
b
[x]].
The usual occlusion detection methods then declare a pixel in the
reference frame as being occluded in the target frame if |ε
f
| > Θ
for frame #1 and |ε
b
| > Θ for frame #2. Note that although newly-
exposed areas cannot be detected by this mechanism (pixels are
not visible), effectively the occluded areas in frame #2 (computed
using d
b
) are in fact the newly-exposed areas for frame #1.
3. GEOMETRY-BASED ESTIMATION OF OCCLUSIONS
– TRADITIONAL APPROACH
An alternative, to the photometric detection of occlusion areas, is a
geometric detection. Such a detection is based on the assumption
that a mismatch of forward and backward motion vectors is due
to disappearing image areas. In particular, the following vector
II - 933 0-7803-8874-7/05/$20.00 ©2005 IEEE ICASSP 2005