High Resolution Matting via Interactive Trimap Segmentation Technical report corresponding to the CVPR’08 paper TR-188-2-2008-04 Christoph Rhemann 1* , Carsten Rother 2 , Alex Rav-Acha 3 , Toby Sharp 2 1 Institute for Software Technology and Interactive Systems 2 Microsoft Research Cambridge 3 The Weizmann Institute of Science Vienna University of Technology, Austria Cambridge, UK Rehovot, Israel Abstract We present a new approach to the matting problem which splits the task into two steps: interactive trimap extraction followed by trimap-based alpha matting. By doing so we gain considerably in terms of speed and quality and are able to deal with high resolution images. This paper has three contributions: (i) a new trimap segmentation method using parametric max-flow; (ii) an alpha matting technique for high resolution images with a new gradient preserving prior on alpha; (iii) a database of 27 ground truth alpha mattes of still objects, which is considerably larger than previous databases and also of higher quality. The database is used to train our system and to validate that both our trimap ex- traction and our matting method improve on state-of-the-art techniques. 1. Introduction Natural image matting addresses the problem of extract- ing an object from its background by recovering the opacity and foreground color of each pixel. Formally, the observed color C is a combination of foreground (F) and background (B) colors: C = αF + (1 α)B (1) interpolated by the opacity value α. (This simplified model will be reconsidered later). Matting is a highly under- constrained problem and hence user interaction is essential. In this introduction we will first consider the user aspects of the matting problem and then compare the different mat- ting approaches themselves. Previous work in this domain can be broadly classified into three types of user interfaces. The first class of interface is based on trimaps [3, 16, 21, 19, 4]. First the user paints a trimap by hand as accurately as possible, i.e. each pixel is assigned to one of three classes: foreground (F), background (B) or unknown (U) (e.g. fig. 1(middle)). In a perfectly tight trimap the α values in U are This work was supported in part by Microsoft Research Cambridge through its PhD Scholarship Programme and a travel sponsorship. above 0 and below 1 and F and B regions have only α val- ues which are exactly 0 and 1 respectively. The information from the known regions (F, B) is used to predict for each un- known pixel the values for F,B and α. It has been shown [21], and we will confirm it, that if the trimap is perfect (or nearly perfect), the resulting matte is of very high quality. The recent soft scissors approach [19] is probably the most sophisticated trimap “paint tool”. It builds on the intelligent scissors approach [13] which gives a hard segmentation, i.e. only F,B labels. In soft scissors the user traces the boundary with a “fat brush”. The brush size is adapted according to the underlying data and intermediate results of the matte are shown, enhancing the user experience. The main drawback of such a brush tool is that objects with a long boundary or complicated boundary topology are very tedious to trace, e.g. a tree with many foreground holes or the example in fig. 1(left). Mainly due to this drawback, the trend of hard segmen- tation has been to move from boundary selection tools like intelligent scissors [13] to scribble-based region selection tools [2, 15]. This second class of interfaces is more user- friendly since only a few pixels have to be assigned to F or B, which are ideally far away from the boundary. Impres- sive results were achieved for hard segmentation [2, 15, 1] and also to some extent for matting [9, 20, 11, 5]. Note, a simple approach to obtain a soft matte from a hard segmen- tation is to run existing trimap-based matting techniques in a band of constant width around the hard segmentation, as done in e.g. [15, 1]. In contrast to this, our work computes an adaptive band which respects the underlying data. We now review scribble-based matting methods. In [9, 11] a pure local “propagation” approach is taken and no global color information (i.e. outside a small window) is used. If the assumption holds that all colors within a small window around any unknown pixel lie on a line in color space then this approach obtains the ground truth matte. We observed in our experiments that this approach obtains good results for relatively tight trimaps (or scribbles), but it per- 1