IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2007 545 Video Inpainting Under Constrained Camera Motion Kedar A. Patwardhan, Student Member, IEEE, Guillermo Sapiro, Senior Member, IEEE, and Marcelo Bertalmío Abstract—A framework for inpainting missing parts of a video sequence recorded with a moving or stationary camera is presented in this work. The region to be inpainted is general: It may be still or moving, in the background or in the foreground, it may occlude one object and be occluded by some other object. The algorithm consists of a simple preprocessing stage and two steps of video inpainting. In the preprocessing stage, we roughly segment each frame into foreground and background. We use this segmentation to build three image mosaics that help to produce time consistent results and also improve the performance of the algorithm by re- ducing the search space. In the first video inpainting step, we re- construct moving objects in the foreground that are “occluded” by the region to be inpainted. To this end, we fill the gap as much as possible by copying information from the moving foreground in other frames, using a priority-based scheme. In the second step, we inpaint the remaining hole with the background. To accom- plish this, we first align the frames and directly copy when pos- sible. The remaining pixels are filled in by extending spatial texture synthesis techniques to the spatiotemporal domain. The proposed framework has several advantages over state-of-the-art algorithms that deal with similar types of data and constraints. It permits some camera motion, is simple to implement, fast, does not require sta- tistical models of background nor foreground, works well in the presence of rich and cluttered backgrounds, and the results show that there is no visible blurring or motion artifacts. A number of real examples taken with a consumer hand-held camera are shown supporting these findings. Index Terms—Camera motion, special effects, texture synthesis, video inpainting. I. INTRODUCTION AND OVERVIEW A. Introduction to the Video Inpainting Problem T HE problem of automatic video restoration in general, and automatic object removal and modification in particular, is beginning to attract the attention of many researchers. In this paper we address a constrained but important case of video in- painting. We assume that the camera motion is approximately parallel to the plane of image projection, and the scene essen- tially consists of stationary background with a moving fore- ground, both of which may require inpainting. The algorithm Manuscript received November 1, 2005; revised July 29, 2006. This work was supported in part by the Office of Naval Research; in part by the National Science Foundation; in part by DARPA; in part by the National Institutes of Health; in part by the National Geospatial-Intelligence Agency, IP-RACINE Project IST-511316; in part by PNPGC project, reference BFM2003-02125; and in part by the Ramón y Cajal Program. The associate editor coordinating the re- view of this manuscript and approving it for publication was Dr. Anil Kokaram. K. A. Patwardhan and G. Sapiro are with the Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail: kedar@umn.edu; guille@umn.edu). M. Bertalmío is with the University Pompeu Fabra, Barcelona, Spain (e-mail: marcelo.bertalmio@upf.edu). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2006.888343 described in this paper is able to inpaint objects that move in any fashion but do not change size appreciably. As we will see below, these assumptions are implicitly or explicitly present in most state of the art algorithms for video inpainting, but they still leave a very challenging task and apply to numerous sce- narios. For a detailed discussion about these assumptions, in- cluding how they are actually relaxed in the real examples here presented, please refer to Section II-A. A number of algorithms for automatic still image completion have been proposed in the literature [3], [5], [6], [11]. These cannot be generalized in a straightforward manner to address the challenging problem of video completion reported in this paper. There has also been some preliminary work on frame-by-frame partial differential equations (PDEs) based video inpainting [4], following [5]. In [4], the PDE is applied spatially, and completes the video frame-by-frame. This does not take into account the temporal information that a video provides, and its application is thereby limited. Also, the PDEs based methods interpolate edges in a smooth manner, but temporal edges are often more abrupt than spatial edges. The authors in [24] recently proposed a method for space- time completion of damaged areas in a video sequence. They pose the problem of video completion as a global optimiza- tion problem, which is inherently computationally very expen- sive. The work extends to space time the pioneering technique of nonparametric sampling developed for still images by Efros and Leung [13]. This implies the assumption that objects move in a periodic manner and also they do not significantly change scale, because otherwise the “copy and paste” approach of [13] would fail. Although the results are good, they suffer from sev- eral shortcomings. Only low-resolution videos are shown, and oversmoothing is often observed. This is due in part to the fact that pixels are synthesized by a weighted average of the best can- didates, and this averaging produces blurring. Also, the camera is always static in all the examples in that paper. Though the reason for this is not discussed, it is probably due to the fact that the authors use a very simple motion estimation procedure in- volving the temporal derivative. We present results comparing with their approach in the experimental section. An interesting probabilistic video modelling technique has been proposed in [10], with application to video inpainting. They define “epitomes” as patch based probability models that are learnt by compiling together a large number of examples of patches from input images. These epitomes are used to synthesize data in the areas of video damage or object removal. The video inpainting results are reported to be similar to those in [24], are primarily low resolution, and oversmoothing is also observed. Very interesting work for repairing damaged video has been recently reported in [15]. Their method involves a gamut of dif- ferent techniques that make the process of inpainting very com- plicated. There is an important amount of user interaction: the 1057-7149/$25.00 © 2006 IEEE