LAYER EXTRACTION USING GRAPH CUTS AND FEATURE TRACKING Vardhman Jain 1 , P. J. Narayanan 1 1 Center for Visual Information technology, IIIT Hyderabad, India {vardhman@students., pjn@}iiit.net Keywords: Layer Extraction, Video segmentation, Graph Cuts, Tracking, Interactive Abstract In this paper we present a new method for layer extraction by tracking a non-rigid body with no fixed motion model, in a video. The method integrates the graph cuts approach with robust point based tracking to achieve good tracking of the whole object over frames of a video. With the help of a little user interaction our method can perform fine layer extraction over irregular motion and difficult object boundaries. To achieve this we apply the 3D graph cuts on a pair of frames and propagate the labels obtained in the earlier frame to new frame by use of robust tracking method. The user is shown the results of the layer extraction and can provide extra strokes to improve the results. 1 Introduction Layer extraction has been a topic of research in recent years. Many techniques have been proposed for automatic segmentation of layers [6, 13, 19, 20]. Though automatic segmentation of video is useful in many application like compression, coding, recognition etc. [20], Interactive segmentation of images [7, 11] and videos [8, 18] has developed recently. The superior quality they achieve with minimal user interaction makes them very attractive. These approaches have objectives similar to those of layer extraction. The extracted layers can be used in many applications of advanced video editing including Matting and Composition. The problem is also closely related to the object tracking problem which in itself has received lot of attention over the years. The method we propose in this paper is based on the generally valid assumption that objects in the videos usually exhibit small motions over frames and also that the frames are temporally related. There are certain issues which discourage the use of techniques which work one frame at a time and then combine the frames: 1. The object’s segmentation over individual frames may not provide temporal continuity. 2. The information of segmentation obtained in earlier frames is not used. 3. The technique becomes very due to huge amount of re- computation at every frame. In our method we try to address these problems. First we use a multi-frame graph which helps maintain temporal continuity and leverage the segmentation obtained in one frame to the other frame. We also effectively prune a large part of the image from being a part of the minimization process and thus making the graph smaller in terms of number of nodes and edges by making use of the assumption of trackability. Due to the use of robust tracking we are able to automatically provide hard constraints in the target frame which act as good seeds for the graph cuts minimization. The layer obtained by our approach can then be used for variety of other applications like video cutout, matting, composition and object removal etc. The paper is organized as follows. Section 2 describes the related work. Section 3 describes our approach in details. Results are demonstrated in Section 4. 2 Related Work Layer extraction problem is closely related to various other problems like image and video segmentation, image and video matting and interactive image editing. Besides there are many applications of video segmentation including advanced video editing and object removal [21]. Image Segmentation: The problem of image segmentation has been around for a very long time. Earlier the techniques were based on clustering the image pixels based on some similarity criteria, which included intensity similarity or color similarity and spatial coherence [4, 17]. Later methods like image snapping [5] and intelligent scissors [9] tool in Adobe Photoshop which allowed user to obtain a contour around the object boundary by roughly tracking the object’s boundary with the mouse, rather than requiring to drag the mouse precisely around the boundary were developed. These methods rely on local features like gradient information and Laplacian zero crossing measures and therefore they do not perform very well on highly textured