A Closed-Form Solution to Natural Image Matting Anat Levin, Dani Lischinski, and Yair Weiss Abstract—Interactive digital matting, the process of extracting a foreground object from an image based on limited user input, is an important task in image and video editing. From a computer vision perspective, this task is extremely challenging because it is massively ill-posed—at each pixel we must estimate the foreground and the background colors, as well as the foreground opacity (“alpha matte”) from a single color measurement. Current approaches either restrict the estimation to a small part of the image, estimating foreground and background colors based on nearby pixels where they are known, or perform iterative nonlinear estimation by alternating foreground and background color estimation with alpha estimation. In this paper, we present a closed-form solution to natural image matting. We derive a cost function from local smoothness assumptions on foreground and background colors and show that in the resulting expression, it is possible to analytically eliminate the foreground and background colors to obtain a quadratic cost function in alpha. This allows us to find the globally optimal alpha matte by solving a sparse linear system of equations. Furthermore, the closed-form formula allows us to predict the properties of the solution by analyzing the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. We show that high-quality mattes for natural images may be obtained from a small amount of user input. Index Terms—Matting, interactive image editing, spectral segmentation. Ç 1 INTRODUCTION N ATURAL image matting and compositing is of central importance in image and video editing. Formally, image matting methods take as input an image I , which is assumed to be a composite of a foreground image F and a background image B. The color of the ith pixel is assumed to be a linear combination of the corresponding foreground and background colors I i ¼ i F i þð1 i ÞB i ; ð1Þ where i is the pixel’s foreground opacity. In natural image matting, all quantities on the right-hand side of the compositing equation (1) are unknown. Thus, for a three- channel color image, at each pixel, there are three equations and seven unknowns. Obviously, this is a severely underconstrained problem, and user interaction is required to extract a good matte. Most recent methods expect the user to provide a trimap as a starting point; an example is shown in Fig. 1e. The trimap is a rough (typically hand-drawn) segmentation of the image into three regions: foreground (shown in white), back- ground (shown in black), and unknown (shown in gray). Given the trimap, these methods typically solve for F , B, and simultaneously. This is typically done by iterative nonlinear optimization, alternating the estimation of F and B with that of . In practice, this means that for good results, the unknown regions in the trimap must be as small as possible. As a consequence, trimap-based approaches typically experience difficulty handling images with a significant portion of mixed pixels or when the foreground object has many holes [19]. In such challenging cases, a great deal of experience and user interaction may be necessary to construct a trimap that would yield a good matte. Another problem with the trimap interface is that the user cannot directly influence the matte in the most important part of the image: the mixed pixels. In this paper, we present a new closed-form solution for extracting the alpha matte from a natural image. We derive a cost function from local smoothness assumptions on fore- ground and background colors F and B and show that in the resulting expression, it is possible to analytically eliminate F and B, yielding a quadratic cost function in . The alpha matte produced by our method is the global optimum of this cost function, which may be obtained by solving a sparse linear system. Since our approach computes directly and without requiring reliable estimates for F and B, a modest amount of user input (such as a sparse set of scribbles) is often sufficient for extracting a high-quality matte. Furthermore, our closed-form formulation enables one to understand and predict the properties of the solution by examining the eigenvectors of a sparse matrix, closely related to matrices used in spectral image segmentation algorithms. In addition to providing a solid theoretical basis for our approach, such analysis can provide useful hints to the user regarding where in the image scribbles should be placed. 1.1 Previous Work Most existing methods for natural image matting require the input image to be accompanied by a trimap [1], [2], [5], [6], [14], [17], labeling each pixel as foreground, back- ground, or unknown. The goal of the method is to solve the compositing equation (1) for the unknown pixels. This is typically done by exploiting some local regularity assump- tions on F and B to predict their values for each pixel in the unknown region. In the Corel KnockOut algorithm [2], F 228 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 30, NO. 2, FEBRUARY 2008 . The authors are with the School of Computer Science and Engineering, The Hebrew University of Jerusalem, 91905, Israel. E-mail: {alevin, danix, yweiss}@cs.huji.ac.il. Manuscript received 7 Aug. 2006; revised 26 Mar. 2007; accepted 30 Apr. 2007; published online 16 May 2007. Recommended for acceptance by H. Shum. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-0582-0806. Digital Object Identifier no. 10.1109/TPAMI.2007.1177. 0162-8828/08/$25.00 ß 2008 IEEE Published by the IEEE Computer Society