IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 3, MARCH 2009 347 Exemplar-Based Video Inpainting Without Ghost Shadow Artifacts by Maintaining Temporal Continuity Timothy K. Shih, Senior Member, IEEE, Nick C. Tang, and Jenq-Neng Hwang, Fellow, IEEE Abstract—Image inpainting or image completion is the tech- nique that automatically restores/completes removed areas in an image. When dealing with a similar problem in video, not only should a robust tracking algorithm be used, but the temporal con- tinuity among video frames also needs to be taken into account, especially when the video has camera motions such as zooming and tilting. In this paper, we extend an exemplar-based image in- painting algorithm by incorporating an improved patch matching strategy for video inpainting. In our proposed algorithm, different motion segments with different temporal continuity call for dif- ferent candidate patches, which are used to inpaint holes after a selected video object is tracked and removed. The proposed new video inpainting algorithm produces very few “ghost shadows,” which were produced by most image inpainting algorithms di- rectly applied on video. Our experiments use different types of videos, including cartoon, video from games, and video from digital camera with different camera motions. Our demonstration at http://member.mine.tku.edu.tw/www/T CSVT/web/shows the promising results. Index Terms—Digital inpainting, image completion, motion map segmentation, object removal, object tracking, video inpainting, video special effect. I. INTRODUCTION I MAGE inpainting/image completion [1], [3], [5] is a technique to restore/complete the area of a removed ob- ject which is manually selected by the users. The technique produces a reasonably good quality of output on still images. Although there are earlier approaches that focus on removing only small well-selected areas on photographs, the work re- ported in [1] and [3] produces fairly good results in general cases, especially when applied to large continuous areas. Image inpainting techniques can complete holes based on both spatial and frequency features. Structural properties, such as edges of a house, are extracted from the spatial domain and used to com- plete an object with its structural property extended [1], [3]. In Manuscript received May 18, 2007; revised September 16, 2007 and De- cember 12, 2007. First published February 13, 2009; current version published April 01, 2009. This paper was recommended by Associate Editor D. S. Turaga. T. K. Shih is with the Department of Computer Science, National Taipei Uni- versity of Education, Taipei 106, Taiwan (e-mail: tshih@cs.tku,edu.tw). N. C. Tang is with the Department of Computer Science and Information En- gineering, Tamkang University, Tamsui 251,Taiwan (e-mail: nickctang@gmail. com). J. N. Hwang is with the Department of Electrical Engineering, University of Washington, Seattle, WA 98195 USA (e-mail: hwang@u.washington.edu). Digital Object Identiﬁer 10.1109/TCSVT.2009.2013519 addition to [1] and [3], another image completion approach [5] uses automatic semantic scene matching to search for potential scenes in a very large image database. The mechanism ﬁlls the missing regions (i.e., scenes) using information usually not in the same picture and provides a diverse set of results for a given input. On the other hand, textural information can be propagated from the surrounding areas toward the center of hole such that a seamless natural scene can be recovered [8]. In an inpainting process, in general, the user has to select a target object to be removed (and thus the hole is created). Although object selection is the only step in which the user has to intervene in the completion procedure, many mechanisms suggest that human intelligence can be incorporated to produce a better result [11], [17]. The work discussed in [11] uses an interface to identify a source area, where texture information is used to inpaint another selected target area. The work discussed in [17] further suggests that most natural or artiﬁcial objects can be deﬁned by a few main curves. The salient structure of an image should be completed before the texture characteristics are brought in. Therefore, by asking the user to draw a few curved lines, the algorithm proposed in [17] can produce excel- lent image inpainting results. In general, the problem of image completion can be deﬁned as the following. Assuming that the original image is decomposed into two parts, , where is a target area/hole manually identiﬁed by the user, and is a source area with information to be used to complete . Also, there is no overlap between the target area and the source area. These terms (i.e., , , and ) are commonly used in most articles discussing inpainting algorithms. However, when dealing with removing objects from a video sequence, several issues should be further considered. First, manually selecting a target area is impossible due to the number of frames. Second, human recommended structural/textural information is difﬁcult to obtain, even with edge detections. Therefore, the procedure of video inpainting needs to incorporate with a robust tracking mechanism and an effective structural/textural propagation mechanism. One of the approaches to complete a removed object in video is to directly apply the techniques used in image completion [1], [3], i.e., treating each video frame as an independent image. Most image completion techniques [1], [3] are based on one assumption—the target area has a similar texture and con- tinuous structure from the source area . Therefore, the source and target areas are divided into equal-size patches, with the size of a patch being small (e.g., 3 3 or 5 5 pixels). Patches from the source area, using a sophisticated matching mecha- 1051-8215/$25.00 © 2009 IEEE