IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 4, APRIL 2012 2119
Interactive Image Segmentation Using Dirichlet
Process Multiple-View Learning
Lei Ding, Alper Yilmaz, and Rong Yan
Abstract—Segmenting semantically meaningful whole objects
from images is a challenging problem, and it becomes especially
so without higher level common sense reasoning. In this paper,
we present an interactive segmentation framework that integrates
image appearance and boundary constraints in a principled way
to address this problem. In particular, we assume that small sets
of pixels, which are referred to as seed pixels, are labeled as the
object and background. The seed pixels are used to estimate
the labels of the unlabeled pixels using Dirichlet process mul-
tiple-view learning, which leverages 1) multiple-view learning that
integrates appearance and boundary constraints and 2) Dirichlet
process mixture-based nonlinear classification that simultaneously
models image features and discriminates between the object and
background classes. With the proposed learning and inference
algorithms, our segmentation framework is experimentally shown
to produce both quantitatively and qualitatively promising results
on a standard dataset of images. In particular, our proposed
framework is able to segment whole objects from images given
insufficient seeds.
Index Terms—Dirichlet processes, image segmentation, proba-
bilistic models.
I. INTRODUCTION
D
ESPITE many years of research, unsupervised image seg-
mentation techniques without human interaction still do
not produce satisfactory results. Fully automated segmentation
is an ill-posed problem due to the fact that there is neither a
clear definition of a correct segmentation nor an objective mea-
sure of the goodness of a segment. In order to do a semanti-
cally meaningful image segmentation, it is essential to take a
priori information about the image into account. This issue has
been addressed in the literature as interactive image segmen-
tation, which takes human inputs through a set of strokes or a
trimap [5], [29] that provides labeled pixels (called seeds) for
the objects and backgrounds and aims to produce semantically
meaningful object regions. Although interactive segmentation
has drawn much attention [7], [12], [13], [15], [19], [33], little
existing work has systematically studied the problem of insuf-
ficient seeds. For instance, in order to perform interactive seg-
mentation on the image on the left panel of Fig. 1, when there
Manuscript received May 02, 2010; revised February 20, 2011 and September
18, 2011; accepted November 30, 2011. Date of publication December 22, 2011;
date of current version March 21, 2012. The associate editor coordinating the
review of this manuscript and approving it for publication was Prof. Laurent
Younes.
L. Ding is with Intent Media Inc., New York, NY 10014 USA (e-mail: lei-
ding326@gmail.com).
A. Yilmaz is with the Photogrammetric Computer Vision Laboratory, The
Ohio State University, Columbus, OH 43210 USA.
R. Yan is with Facebook, Inc., Palo Alto, CA 94304 USA.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2011.2181398
Fig. 1. (Left) Interactive image segmentation task with red and blue denoting
seeds for the object and background. With the proposed framework, the whole
palm tree can be cut out even when there is no seed at the trunk. (Right)
Boundary map contains strong cues for segment labeling. We show how to fuse
appearance and boundary information in segmentation.
Fig. 2. Superpixels overlaid on the penguin image using the normalized-cut
method as adopted by [28]. A small number of superpixels are shown for visu-
alization purpose.
is no seed at the trunk of the tree, a typical interactive segmen-
tation method would fail to segment it as part of the tree. The
reason is that the trunk and leaves have very different color com-
ponents, and therefore, there is no sufficient cue for the seeds on
the leaves to influence the segmentation decisions on the trunk.
A partial solution to this problem is through active learning
[30], a framework that allows the learning algorithm to ask for
informative labeled examples at certain costs. However, addi-
tional labeled information through user interaction is not always
available. Thus, we propose an automatic approach in order to
address the issue when fewer than enough seeds are provided.
Such situations might happen in real applications where an in-
sufficient amount of seeds are given by an amateurish user.
Both human and computer vision literature suggest the use
of multiple cues for object perception [9], [27], and the task of
image segmentation is no exception. Our approach utilizes cues
from both image appearance and salient boundaries. Specif-
ically, pixels with similar appearance to the seed pixels are
merged together, and so are pixels satisfying similar boundary
constraints when appearance fails to provide sufficient cues.
The boundary constraints from an image are systematically
1057-7149/$26.00 © 2011 IEEE