GEOMETRY-BASED ESTIMATION OF OCCLUSIONS FROM VIDEO FRAME PAIRS Serdar Ince and Janusz Konrad Department of Electrical and Computer Engineering, Boston University 8 Saint Mary’s St., Boston, MA 02215 ABSTRACT The knowledge of occlusions and newly-exposed areas, a natural consequence of changing object juxtaposition in a 3-D scene, can be effectively used to improve video coding efﬁciency, video rate conversion quality and view interpolation ﬁdelity. Although var- ious occlusion estimation methods have been proposed to date, most of them are not robust or are computationally complex. In this paper, we study two simple, well-known occlusion estima- tion methods, one based on a photometric mismatch between two frames of an image sequence, while the other based on a geomet- ric mismatch. We demonstrate their weaknesses and propose a new geometric method that exhibits good robustness to noise in the data while maintaining low computational complexity. 1. INTRODUCTION Occlusion effects occurring in image sequences are a natural con- sequence of changing object juxtaposition in a 3-D scene. These effects result in parts of an image frame disappearing in the fol- lowing frame(s), known as occlusion areas, or appearing in the following frame(s), known as newly-exposed areas. Both types of areas play a very important role in motion estimation from dy- namic imagery and in disparity estimation from stereo or multi- view imagery; for frame points in occlusion areas forward motion is undeﬁned (those points disappear in the next frame). Similarly, for frame points in newly-exposed areas backward motion is not deﬁned. Consequently, motion parameters should not be com- puted for image points belonging to either type of area, as they are meaningless. However, and this is the second observation, most motion estimation algorithms employ some form of regu- larization (explicit motion smoothness prior, block-based motion model, intensity matching over a window, etc.). Since in occlusion and newly-exposed areas motion parameters are undeﬁned, regu- larization should be disallowed between image points from those areas and neighboring points with well-deﬁned motion. In order to achieve this, occlusion and newly-exposed areas must be explicitly known. Similar observations apply to disparity estimation. Estimation of occlusion and newly-exposed areas is an inverse problem and, as such, is ill-posed. Most of occlusion/ newly- exposed area estimation methods rely on 3 or more frames to make decision about individual image points [1, 2, 3, 4, 5]. Such meth- ods compare intensity consistency between the current and previ- ous frame(s) with that between the current and future frame(s). In general, this improves reliability of occlusion estimation but re- quires larger buffers and is more complex computationally. Meth- ods have been proposed that estimate newly exposed areas from This work was supported by the National Science Foundation under grant ECS-0219224. E-mail: {ince,jkonrad}@bu.edu two image frames only. However, such methods based on photo- metric detection mechanism (intensity mismatch) [6, 7] are not re- liable, while those based on geometric mechanism (motion vector mismatch) [8] although more reliable under high PSNR conditions still fail on noisy data. In this paper, we propose a simple method for the detection of occlusion and newly-exposed areas that is based on geometric properties of the motion ﬁeld. The method is applicable to any motion ﬁeld derived from an image pair. Its principle is based on the observation that the regular grid in the reference image plane, at which the motion vectors are anchored, forms an irregular grid in the target image plane after motion compensation. Since the target image will contain no motion-compensated projections in the newly-exposed areas, such areas can be easily detected. We present a simple neighborhood test to detect newly-exposed pix- els and we compare our approach with standard photometry- and geometry-based methods. 2. PHOTOMETRY-BASED ESTIMATION OF OCCLUSIONS The usual assumption in estimation of occlusions from two frames, is excessive intensity matching (motion-compensated prediction) error observed; reference-frame pixels that disappear cannot be accurately matched in the target frame and thus induce signiﬁcant errors. Let I 1[x] denote intensity of the ﬁrst frame of a sequence at spatial position x, and I2[x] – similar intensity in the second frame. If d f denotes a forward motion (disparity) ﬁeld anchored on the sampling grid of frame #1 (reference) and pointing to the target frame #2, while d b denotes a backward motion ﬁeld, then the corresponding motion-compensated prediction errors at x are: ε f [x] = I1[x] − I2[x + d f [x]], ε b [x] = I2[x] − I1[x − d b [x]]. The usual occlusion detection methods then declare a pixel in the reference frame as being occluded in the target frame if |ε f | > Θ for frame #1 and |ε b | > Θ for frame #2. Note that although newly- exposed areas cannot be detected by this mechanism (pixels are not visible), effectively the occluded areas in frame #2 (computed using d b ) are in fact the newly-exposed areas for frame #1. 3. GEOMETRY-BASED ESTIMATION OF OCCLUSIONS – TRADITIONAL APPROACH An alternative, to the photometric detection of occlusion areas, is a geometric detection. Such a detection is based on the assumption that a mismatch of forward and backward motion vectors is due to disappearing image areas. In particular, the following vector II - 933 0-7803-8874-7/05/$20.00 ©2005 IEEE ICASSP 2005