IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2007 545
Video Inpainting Under Constrained Camera Motion
Kedar A. Patwardhan, Student Member, IEEE, Guillermo Sapiro, Senior Member, IEEE, and Marcelo Bertalmío
Abstract—A framework for inpainting missing parts of a video
sequence recorded with a moving or stationary camera is presented
in this work. The region to be inpainted is general: It may be still
or moving, in the background or in the foreground, it may occlude
one object and be occluded by some other object. The algorithm
consists of a simple preprocessing stage and two steps of video
inpainting. In the preprocessing stage, we roughly segment each
frame into foreground and background. We use this segmentation
to build three image mosaics that help to produce time consistent
results and also improve the performance of the algorithm by re-
ducing the search space. In the first video inpainting step, we re-
construct moving objects in the foreground that are “occluded” by
the region to be inpainted. To this end, we fill the gap as much as
possible by copying information from the moving foreground in
other frames, using a priority-based scheme. In the second step,
we inpaint the remaining hole with the background. To accom-
plish this, we first align the frames and directly copy when pos-
sible. The remaining pixels are filled in by extending spatial texture
synthesis techniques to the spatiotemporal domain. The proposed
framework has several advantages over state-of-the-art algorithms
that deal with similar types of data and constraints. It permits some
camera motion, is simple to implement, fast, does not require sta-
tistical models of background nor foreground, works well in the
presence of rich and cluttered backgrounds, and the results show
that there is no visible blurring or motion artifacts. A number of
real examples taken with a consumer hand-held camera are shown
supporting these findings.
Index Terms—Camera motion, special effects, texture synthesis,
video inpainting.
I. INTRODUCTION AND OVERVIEW
A. Introduction to the Video Inpainting Problem
T
HE problem of automatic video restoration in general, and
automatic object removal and modification in particular, is
beginning to attract the attention of many researchers. In this
paper we address a constrained but important case of video in-
painting. We assume that the camera motion is approximately
parallel to the plane of image projection, and the scene essen-
tially consists of stationary background with a moving fore-
ground, both of which may require inpainting. The algorithm
Manuscript received November 1, 2005; revised July 29, 2006. This work
was supported in part by the Office of Naval Research; in part by the National
Science Foundation; in part by DARPA; in part by the National Institutes of
Health; in part by the National Geospatial-Intelligence Agency, IP-RACINE
Project IST-511316; in part by PNPGC project, reference BFM2003-02125; and
in part by the Ramón y Cajal Program. The associate editor coordinating the re-
view of this manuscript and approving it for publication was Dr. Anil Kokaram.
K. A. Patwardhan and G. Sapiro are with the Electrical and Computer
Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail:
kedar@umn.edu; guille@umn.edu).
M. Bertalmío is with the University Pompeu Fabra, Barcelona, Spain (e-mail:
marcelo.bertalmio@upf.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2006.888343
described in this paper is able to inpaint objects that move in
any fashion but do not change size appreciably. As we will see
below, these assumptions are implicitly or explicitly present in
most state of the art algorithms for video inpainting, but they
still leave a very challenging task and apply to numerous sce-
narios. For a detailed discussion about these assumptions, in-
cluding how they are actually relaxed in the real examples here
presented, please refer to Section II-A.
A number of algorithms for automatic still image completion
have been proposed in the literature [3], [5], [6], [11]. These
cannot be generalized in a straightforward manner to address the
challenging problem of video completion reported in this paper.
There has also been some preliminary work on frame-by-frame
partial differential equations (PDEs) based video inpainting [4],
following [5]. In [4], the PDE is applied spatially, and completes
the video frame-by-frame. This does not take into account the
temporal information that a video provides, and its application
is thereby limited. Also, the PDEs based methods interpolate
edges in a smooth manner, but temporal edges are often more
abrupt than spatial edges.
The authors in [24] recently proposed a method for space-
time completion of damaged areas in a video sequence. They
pose the problem of video completion as a global optimiza-
tion problem, which is inherently computationally very expen-
sive. The work extends to space time the pioneering technique
of nonparametric sampling developed for still images by Efros
and Leung [13]. This implies the assumption that objects move
in a periodic manner and also they do not significantly change
scale, because otherwise the “copy and paste” approach of [13]
would fail. Although the results are good, they suffer from sev-
eral shortcomings. Only low-resolution videos are shown, and
oversmoothing is often observed. This is due in part to the fact
that pixels are synthesized by a weighted average of the best can-
didates, and this averaging produces blurring. Also, the camera
is always static in all the examples in that paper. Though the
reason for this is not discussed, it is probably due to the fact that
the authors use a very simple motion estimation procedure in-
volving the temporal derivative. We present results comparing
with their approach in the experimental section.
An interesting probabilistic video modelling technique has
been proposed in [10], with application to video inpainting.
They define “epitomes” as patch based probability models that
are learnt by compiling together a large number of examples of
patches from input images. These epitomes are used to synthesize
data in the areas of video damage or object removal. The video
inpainting results are reported to be similar to those in [24], are
primarily low resolution, and oversmoothing is also observed.
Very interesting work for repairing damaged video has been
recently reported in [15]. Their method involves a gamut of dif-
ferent techniques that make the process of inpainting very com-
plicated. There is an important amount of user interaction: the
1057-7149/$25.00 © 2006 IEEE