IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 3, MARCH 2009 347
Exemplar-Based Video Inpainting Without
Ghost Shadow Artifacts by Maintaining
Temporal Continuity
Timothy K. Shih, Senior Member, IEEE, Nick C. Tang, and Jenq-Neng Hwang, Fellow, IEEE
Abstract—Image inpainting or image completion is the tech-
nique that automatically restores/completes removed areas in an
image. When dealing with a similar problem in video, not only
should a robust tracking algorithm be used, but the temporal con-
tinuity among video frames also needs to be taken into account,
especially when the video has camera motions such as zooming
and tilting. In this paper, we extend an exemplar-based image in-
painting algorithm by incorporating an improved patch matching
strategy for video inpainting. In our proposed algorithm, different
motion segments with different temporal continuity call for dif-
ferent candidate patches, which are used to inpaint holes after a
selected video object is tracked and removed. The proposed new
video inpainting algorithm produces very few “ghost shadows,”
which were produced by most image inpainting algorithms di-
rectly applied on video. Our experiments use different types of
videos, including cartoon, video from games, and video from
digital camera with different camera motions. Our demonstration
at http://member.mine.tku.edu.tw/www/T CSVT/web/shows the
promising results.
Index Terms—Digital inpainting, image completion, motion map
segmentation, object removal, object tracking, video inpainting,
video special effect.
I. INTRODUCTION
I
MAGE inpainting/image completion [1], [3], [5] is a
technique to restore/complete the area of a removed ob-
ject which is manually selected by the users. The technique
produces a reasonably good quality of output on still images.
Although there are earlier approaches that focus on removing
only small well-selected areas on photographs, the work re-
ported in [1] and [3] produces fairly good results in general
cases, especially when applied to large continuous areas. Image
inpainting techniques can complete holes based on both spatial
and frequency features. Structural properties, such as edges of
a house, are extracted from the spatial domain and used to com-
plete an object with its structural property extended [1], [3]. In
Manuscript received May 18, 2007; revised September 16, 2007 and De-
cember 12, 2007. First published February 13, 2009; current version published
April 01, 2009. This paper was recommended by Associate Editor D. S. Turaga.
T. K. Shih is with the Department of Computer Science, National Taipei Uni-
versity of Education, Taipei 106, Taiwan (e-mail: tshih@cs.tku,edu.tw).
N. C. Tang is with the Department of Computer Science and Information En-
gineering, Tamkang University, Tamsui 251,Taiwan (e-mail: nickctang@gmail.
com).
J. N. Hwang is with the Department of Electrical Engineering, University of
Washington, Seattle, WA 98195 USA (e-mail: hwang@u.washington.edu).
Digital Object Identifier 10.1109/TCSVT.2009.2013519
addition to [1] and [3], another image completion approach [5]
uses automatic semantic scene matching to search for potential
scenes in a very large image database. The mechanism fills
the missing regions (i.e., scenes) using information usually
not in the same picture and provides a diverse set of results
for a given input. On the other hand, textural information can
be propagated from the surrounding areas toward the center
of hole such that a seamless natural scene can be recovered
[8]. In an inpainting process, in general, the user has to select
a target object to be removed (and thus the hole is created).
Although object selection is the only step in which the user has
to intervene in the completion procedure, many mechanisms
suggest that human intelligence can be incorporated to produce
a better result [11], [17]. The work discussed in [11] uses an
interface to identify a source area, where texture information is
used to inpaint another selected target area. The work discussed
in [17] further suggests that most natural or artificial objects
can be defined by a few main curves. The salient structure of an
image should be completed before the texture characteristics
are brought in. Therefore, by asking the user to draw a few
curved lines, the algorithm proposed in [17] can produce excel-
lent image inpainting results. In general, the problem of image
completion can be defined as the following. Assuming that the
original image is decomposed into two parts, , where
is a target area/hole manually identified by the user, and
is a source area with information to be used to complete .
Also, there is no overlap between the target area and the source
area. These terms (i.e., , , and ) are commonly used in
most articles discussing inpainting algorithms. However, when
dealing with removing objects from a video sequence, several
issues should be further considered. First, manually selecting a
target area is impossible due to the number of frames. Second,
human recommended structural/textural information is difficult
to obtain, even with edge detections. Therefore, the procedure
of video inpainting needs to incorporate with a robust tracking
mechanism and an effective structural/textural propagation
mechanism.
One of the approaches to complete a removed object in video
is to directly apply the techniques used in image completion [1],
[3], i.e., treating each video frame as an independent image.
Most image completion techniques [1], [3] are based on one
assumption—the target area has a similar texture and con-
tinuous structure from the source area . Therefore, the source
and target areas are divided into equal-size patches, with the size
of a patch being small (e.g., 3 3 or 5 5 pixels). Patches
from the source area, using a sophisticated matching mecha-
1051-8215/$25.00 © 2009 IEEE