IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 5, MAY 2012 2747 Co-Transduction for Shape Retrieval Xiang Bai, Bo Wang, Cong Yao, Wenyu Liu, and Zhuowen Tu Abstract—In this paper, we propose a new shape/object retrieval algorithm, namely, co-transduction. The performance of a retrieval system is critically decided by the accuracy of adopted similarity measures (distances or metrics). In shape/object retrieval, ideally, intraclass objects should have smaller distances than interclass objects. However, it is a difﬁcult task to design an ideal metric to account for the large intraclass variation. Different types of measures may focus on different aspects of the objects: for ex- ample, measures computed based on contours and skeletons are often complementary to each other. Our goal is to develop an algorithm to fuse different similarity measures for robust shape retrieval through a semisupervised learning framework. We name our method co-transduction, which is inspired by the co-training algorithm. Given two similarity measures and a query shape, the algorithm iteratively retrieves the most similar shapes using one measure and assigns them to a pool for the other measure to do a re-ranking, and vice versa. Using co-transduction, we achieved an improved result of 97.72% (bull’s-eye measure) on the MPEG-7 data set over the state-of-the-art performance. We also present an algorithm called tri-transduction to fuse multiple-input similarities, and it achieved 99.06% on the MPEG-7 data set. Our algorithm is general, and it can be directly applied on input simi- larity measures/metrics; it is not limited to object shape retrieval and can be applied to other tasks for ranking/retrieval. Index Terms—Graph transduction, object retrieval, shape re- trieval, similarity measure. I. INTRODUCTION S HAPE-BASED object retrieval is an important task in com- puter vision. Given a query object, the most similar objects are retrieved from a database based on a certain similarity/dis- tance measure, whose choice largely decides the performance of a retrieval system. Therefore, it is of critical importance to have a faithful similarity measure to account for the large in- traclass and instance-level variation in conﬁguration, nonrigid Manuscript received November 24, 2010; revised June 21, 2011; accepted September 10, 2011. Date of publication September 29, 2011; date of current version April 18, 2012. This work was supported in part by the National Natural Science Foundation of China under Grant 60903096 and Grant 60873127, by the Ofﬁce of Naval Research under Grant N000140910099, and by the National Science Foundation under CAREER Award IIS-0844566. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Arun A. Ross. X. Bai, C. Yao, and W. Liu are with the Department of Electronics and Information Engineering, Huazhong University of Science and Technology, Wuhan 430074, China (e-mail: xbai@hust.edu.cn; yaocong2010@gmail.com; liuwy@hust.edu.cn). B. Wang was with the Department of Electronics and Information Engi- neering, Huazhong University of Science and Technology, Wuhan 430074, China. He is now with the Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada (e-mail: wangbo.yunze@gmail.com). Z. Tu is with Microsoft Research Asia, Beijing 100080, China, and also with the Laboratory of Neuro Imaging, Department of Neurology, University of Cal- ifornia, Los Angeles, CA 90095 USA (e-mail: ztu@loni.ucla.edu). Color versions of one or more of the ﬁgures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identiﬁer 10.1109/TIP.2011.2170082 Fig. 1. A horse in (a) may look more similar to a dog in (b) than to another horse in (c). transformation, and part change. Ideally, such a similarity mea- sure should result in smaller distances between the variants of a particular object than this object to any other ones, as well as smaller distances between intraclass objects than interclass objects. However, designing such a measure for the general re- trieval task is challenging. Fig. 1 gives an illustration where a horse might have a smaller distance to a dog (based on their contours) than another horse, whereas our human vision sys- tems can still identify them correctly. In this paper, we refer to shape as the contour of an object sil- houette. Building correspondences is often the ﬁrst step in com- puting the shape difference, but it is challenging: Two shapes may not have direct correspondences, regardless of being rep- resented as sparse points, closed contours, or parametric func- tions. For example, two shapes with the same contour but dif- ferent starting points typically are considered as the same one. Therefore, measuring the similarity between two shapes often can be done in two ways: 1) by computing the direct difference in features extracted from shape contours, which are invariant to the choice of starting points and robust to a certain degree of de- formation, such as moments [1] and Fourier descriptors [2]; and 2) by performing matching to ﬁnd the detailed pointwise corre- spondences to compute the differences [3]–[8]. The latter has re- cently become dominant due to its ability of capturing intrinsic properties, thus leading to more accurate similarity measures. Bai et al. [9] explored the group contextual information on different shapes to improve the efﬁciency of shape retrieval on several standard data sets [10], [11]. The basic idea was to use shapes as each other’s contexts in propagation to reduce the dis- tances between intraclass objects. The implementation was done by a graph-based transduction approach, named label propa- gation (LP) [12]. Later, several other graph-based transduction methods were suggested for shape retrieval [13], [14]. In addi- tion, the method in [14] further improved the results by adding “ghost points,” which were constructed based on query shape and its nearest neighbors from the database. Egozi et al. [15] proposed a contextual similarity function, named meta simi- larity, which characterizes a given object by its similarity to its -nearest neighbor ( -NN) objects. An interesting distance learning method called contextual dissimilarity measure (CDM) 1057-7149/$26.00 © 2011 IEEE