Stereoscopic Inpainting: Joint Color and Depth Completion from Stereo Images Liang Wang lwangd@cs.uky.edu Hailin Jin hljin@adobe.com Ruigang Yang ryang@cs.uky.edu Minglun Gong § gong@cs.mun.ca Center for Visualization and Virtual Environments, University of Kentucky, USA Advanced Technology Labs, Adobe Systems Incorporated, USA § Computer Science Department, Memorial University of Newfoundland, Canada Abstract We present a novel algorithm for simultaneous color and depth inpainting. The algorithm takes stereo images and estimated disparity maps as input and fills in missing color and depth information introduced by occlusions or object removal. We first complete the disparities for the occlusion regions using a segmentation-based approach. The com- pleted disparities can be used to facilitate the user in label- ing objects to be removed. Since part of the removed regions in one image is visible in the other, we mutually complete the two images through 3D warping. Finally, we complete the remaining unknown regions using a depth-assisted tex- ture synthesis technique, which simultaneously fills in both color and depth. We demonstrate the effectiveness of the proposed algorithm on several challenging data sets. 1. Introduction Digital photos have become a ubiquitous part of our ev- eryday life. As a result, image inpainting, a digital image processing technique to seamlessly fill in holes in an image, has received considerable attention in the research commu- nity. While most existing inpainting works mainly focus on texture completion on a single image, we in this paper ad- dress a novel problem of completing both texture and depth of a stereo image pair after object removal. Our proposed stereoscopic inpainting algorithm is de- signed to jointly complete missing texture and depth by leveraging the following advantages introduced by the stereo images. First, the region to be filled after object re- moval may be partially visible in the other camera view, reducing the need to entirely “hallucinate” the color in the holes. Secondly, the depth information from stereo match- ing can be used to differentiate structural elements and guide the texture synthesis process. Lastly, the consistency of inpainting results on both images and depth maps pro- vides a quality measure, based on which an iterative algo- rithm can be developed to automatically detect artifacts and refine the completion. Being a counterpart of conventional color completion techniques, by utilizing stereo images and depth informa- tion our approach is able to complete complex salient struc- tures exist in the missing region and provide more plausi- ble texture synthesis results. Experimental results demon- strate that our novel completion framework produces im- ages with higher fidelity and fewer artifacts compared to tra- ditional inpainting works. What is more, besides pure two- dimensional texture synthesis, stereoscopic inpainting can also be used to facilitate many interesting applications in 3D (e.g. View Synthesis and Image-Based Modeling) since our algorithm makes it more practical to obtain consistent stereo images and depth maps with undesired objects being removed. 1.1. Related work This work is related to a sizable body of literature on im- age inpainting, started by the work of [1]. Of particular in- terest are the example-based approaches [5, 11, 15] which fill missing regions with patches sampled from known ar- eas. To better cope with salient structures in the images, Sun et al.[19] proposed a system that allowed the user to specify curves or line segments on which the most salient missing structures reside, and Drori et al.[7] proposed to use “point of interest” to further improve the completion quality. Cues from multiple images have also been explored in the past. Kang et al.[13] used landmarks to match images and then copied warped patches from different images. Wilczkowiak et al.[22] suggested to increase the sampling spaces by considering patches from images taken from different per- spectives. This work differs from both [13] and [22] in that we perform depth estimation from the input images and use the resulting depth to guide the sampling process. [2] also uses depth information from photos to perform completion. However their input is a video sequence and the completion process requires a large number of nearby video frames and 1