IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 5, MAY 2012 2747
Co-Transduction for Shape Retrieval
Xiang Bai, Bo Wang, Cong Yao, Wenyu Liu, and Zhuowen Tu
Abstract—In this paper, we propose a new shape/object retrieval
algorithm, namely, co-transduction. The performance of a retrieval
system is critically decided by the accuracy of adopted similarity
measures (distances or metrics). In shape/object retrieval, ideally,
intraclass objects should have smaller distances than interclass
objects. However, it is a difficult task to design an ideal metric
to account for the large intraclass variation. Different types of
measures may focus on different aspects of the objects: for ex-
ample, measures computed based on contours and skeletons are
often complementary to each other. Our goal is to develop an
algorithm to fuse different similarity measures for robust shape
retrieval through a semisupervised learning framework. We name
our method co-transduction, which is inspired by the co-training
algorithm. Given two similarity measures and a query shape,
the algorithm iteratively retrieves the most similar shapes using
one measure and assigns them to a pool for the other measure
to do a re-ranking, and vice versa. Using co-transduction, we
achieved an improved result of 97.72% (bull’s-eye measure) on the
MPEG-7 data set over the state-of-the-art performance. We also
present an algorithm called tri-transduction to fuse multiple-input
similarities, and it achieved 99.06% on the MPEG-7 data set. Our
algorithm is general, and it can be directly applied on input simi-
larity measures/metrics; it is not limited to object shape retrieval
and can be applied to other tasks for ranking/retrieval.
Index Terms—Graph transduction, object retrieval, shape re-
trieval, similarity measure.
I. INTRODUCTION
S
HAPE-BASED object retrieval is an important task in com-
puter vision. Given a query object, the most similar objects
are retrieved from a database based on a certain similarity/dis-
tance measure, whose choice largely decides the performance
of a retrieval system. Therefore, it is of critical importance to
have a faithful similarity measure to account for the large in-
traclass and instance-level variation in configuration, nonrigid
Manuscript received November 24, 2010; revised June 21, 2011; accepted
September 10, 2011. Date of publication September 29, 2011; date of current
version April 18, 2012. This work was supported in part by the National Natural
Science Foundation of China under Grant 60903096 and Grant 60873127, by
the Office of Naval Research under Grant N000140910099, and by the National
Science Foundation under CAREER Award IIS-0844566. The associate editor
coordinating the review of this manuscript and approving it for publication was
Dr. Arun A. Ross.
X. Bai, C. Yao, and W. Liu are with the Department of Electronics and
Information Engineering, Huazhong University of Science and Technology,
Wuhan 430074, China (e-mail: xbai@hust.edu.cn; yaocong2010@gmail.com;
liuwy@hust.edu.cn).
B. Wang was with the Department of Electronics and Information Engi-
neering, Huazhong University of Science and Technology, Wuhan 430074,
China. He is now with the Department of Computer Science, University of
Toronto, Toronto, ON M5S 3G4, Canada (e-mail: wangbo.yunze@gmail.com).
Z. Tu is with Microsoft Research Asia, Beijing 100080, China, and also with
the Laboratory of Neuro Imaging, Department of Neurology, University of Cal-
ifornia, Los Angeles, CA 90095 USA (e-mail: ztu@loni.ucla.edu).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIP.2011.2170082
Fig. 1. A horse in (a) may look more similar to a dog in (b) than to another
horse in (c).
transformation, and part change. Ideally, such a similarity mea-
sure should result in smaller distances between the variants of
a particular object than this object to any other ones, as well
as smaller distances between intraclass objects than interclass
objects. However, designing such a measure for the general re-
trieval task is challenging. Fig. 1 gives an illustration where a
horse might have a smaller distance to a dog (based on their
contours) than another horse, whereas our human vision sys-
tems can still identify them correctly.
In this paper, we refer to shape as the contour of an object sil-
houette. Building correspondences is often the first step in com-
puting the shape difference, but it is challenging: Two shapes
may not have direct correspondences, regardless of being rep-
resented as sparse points, closed contours, or parametric func-
tions. For example, two shapes with the same contour but dif-
ferent starting points typically are considered as the same one.
Therefore, measuring the similarity between two shapes often
can be done in two ways: 1) by computing the direct difference
in features extracted from shape contours, which are invariant to
the choice of starting points and robust to a certain degree of de-
formation, such as moments [1] and Fourier descriptors [2]; and
2) by performing matching to find the detailed pointwise corre-
spondences to compute the differences [3]–[8]. The latter has re-
cently become dominant due to its ability of capturing intrinsic
properties, thus leading to more accurate similarity measures.
Bai et al. [9] explored the group contextual information on
different shapes to improve the efficiency of shape retrieval on
several standard data sets [10], [11]. The basic idea was to use
shapes as each other’s contexts in propagation to reduce the dis-
tances between intraclass objects. The implementation was done
by a graph-based transduction approach, named label propa-
gation (LP) [12]. Later, several other graph-based transduction
methods were suggested for shape retrieval [13], [14]. In addi-
tion, the method in [14] further improved the results by adding
“ghost points,” which were constructed based on query shape
and its nearest neighbors from the database. Egozi et al. [15]
proposed a contextual similarity function, named meta simi-
larity, which characterizes a given object by its similarity to
its -nearest neighbor ( -NN) objects. An interesting distance
learning method called contextual dissimilarity measure (CDM)
1057-7149/$26.00 © 2011 IEEE