IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012 2119 Interactive Image Segmentation Using Dirichlet Process Multiple-View Learning Lei Ding, Alper Yilmaz, and Rong Yan Abstract—Segmenting semantically meaningful whole objects from images is a challenging problem, and it becomes especially so without higher level common sense reasoning. In this paper, we present an interactive segmentation framework that integrates image appearance and boundary constraints in a principled way to address this problem. In particular, we assume that small sets of pixels, which are referred to as seed pixels, are labeled as the object and background. The seed pixels are used to estimate the labels of the unlabeled pixels using Dirichlet process mul- tiple-view learning, which leverages 1) multiple-view learning that integrates appearance and boundary constraints and 2) Dirichlet process mixture-based nonlinear classiﬁcation that simultaneously models image features and discriminates between the object and background classes. With the proposed learning and inference algorithms, our segmentation framework is experimentally shown to produce both quantitatively and qualitatively promising results on a standard dataset of images. In particular, our proposed framework is able to segment whole objects from images given insufﬁcient seeds. Index Terms—Dirichlet processes, image segmentation, proba- bilistic models. I. INTRODUCTION D ESPITE many years of research, unsupervised image seg- mentation techniques without human interaction still do not produce satisfactory results. Fully automated segmentation is an ill-posed problem due to the fact that there is neither a clear deﬁnition of a correct segmentation nor an objective mea- sure of the goodness of a segment. In order to do a semanti- cally meaningful image segmentation, it is essential to take a priori information about the image into account. This issue has been addressed in the literature as interactive image segmen- tation, which takes human inputs through a set of strokes or a trimap [5], [29] that provides labeled pixels (called seeds) for the objects and backgrounds and aims to produce semantically meaningful object regions. Although interactive segmentation has drawn much attention [7], [12], [13], [15], [19], [33], little existing work has systematically studied the problem of insuf- ﬁcient seeds. For instance, in order to perform interactive seg- mentation on the image on the left panel of Fig. 1, when there Manuscript received May 02, 2010; revised February 20, 2011 and September 18, 2011; accepted November 30, 2011. Date of publication December 22, 2011; date of current version March 21, 2012. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Laurent Younes. L. Ding is with Intent Media Inc., New York, NY 10014 USA (e-mail: lei- ding326@gmail.com). A. Yilmaz is with the Photogrammetric Computer Vision Laboratory, The Ohio State University, Columbus, OH 43210 USA. R. Yan is with Facebook, Inc., Palo Alto, CA 94304 USA. Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TIP.2011.2181398 Fig. 1. (Left) Interactive image segmentation task with red and blue denoting seeds for the object and background. With the proposed framework, the whole palm tree can be cut out even when there is no seed at the trunk. (Right) Boundary map contains strong cues for segment labeling. We show how to fuse appearance and boundary information in segmentation. Fig. 2. Superpixels overlaid on the penguin image using the normalized-cut method as adopted by [28]. A small number of superpixels are shown for visu- alization purpose. is no seed at the trunk of the tree, a typical interactive segmen- tation method would fail to segment it as part of the tree. The reason is that the trunk and leaves have very different color com- ponents, and therefore, there is no sufﬁcient cue for the seeds on the leaves to inﬂuence the segmentation decisions on the trunk. A partial solution to this problem is through active learning [30], a framework that allows the learning algorithm to ask for informative labeled examples at certain costs. However, addi- tional labeled information through user interaction is not always available. Thus, we propose an automatic approach in order to address the issue when fewer than enough seeds are provided. Such situations might happen in real applications where an in- sufﬁcient amount of seeds are given by an amateurish user. Both human and computer vision literature suggest the use of multiple cues for object perception [9], [27], and the task of image segmentation is no exception. Our approach utilizes cues from both image appearance and salient boundaries. Specif- ically, pixels with similar appearance to the seed pixels are merged together, and so are pixels satisfying similar boundary constraints when appearance fails to provide sufﬁcient cues. The boundary constraints from an image are systematically 1057-7149/$26.00 © 2011 IEEE