SUPPORT VECTOR MACHINES FOR REGION-BASED IMAGE RETRIEVAL * * This work was performed at Microsoft Research Asia. The authors are supported by National Nature Sciences Foundation of China No. 60135010 Feng Jing State Key Lab of Intelligent Technology and Systems Beijing 100084, China Scenery_JF@hotmail.com Mingjing Li, Hong-Jiang Zhang Microsoft Research Asia 49 Zhichun Road Beijing 100080, China {mjli, hjzhang}@microsoft.com Bo Zhang State Key Lab of Intelligent Technology and Systems Beijing 100084, China dcszb@ mail.tsinghua.edu.cn ABSTRACT In this paper, the application of support vector machines (SVM) in relevance feedback for region-based image retrieval is investigated. Both the one class SVM as a class distribution estimator and two classes SVM as a classifier are taken into account. For the latter, two representative display strategies are studied. Since the common kernels often rely on inner product or L p norm in the input space, they are infeasible in the region-based image retrieval systems that use variable-length representations. To resolve the issue, a new kind of kernel that is a generalization of Gaussian kernel is proposed. Experimental results on a database of 10,000 general- purpose images demonstrate the effectiveness and robustness of the proposed approach. 1. INTRODUCTION Many early content-based image retrieval (CBIR) systems perform retrieval based primarily on global features [5][9]. It is not unusual that users accessing a CBIR system look for objects, but the aforementioned systems are likely to fail, since a single signature computed for the entire image cannot sufficiently capture the important properties of individual objects. Region-based image retrieval (RBIR) methods [1][6][11] attempt to overcome the drawback of global features by representing images at object-level, which is intended to be close to the perception of human visual system. To narrow down the gap between low-level features and high-level concepts, relevance feedback (RF) initially developed in text retrieval [8] was introduced into CBIR during mid 1990’s and has been shown to provide dramatic performance boost in retrieval systems [6][10]. The main idea of RF is to let users guide the system. During retrieval process, the user interacts with the system and rates the relevance of the retrieved images, according to his/her subjective judgment. With this additional information, the system dynamically learns the user’s intention, and gradually presents better results. As a core machine learning technology, SVM has not only strong theoretical foundations but also excellent empirical successes [3]. SVM has also been introduced into CRIR as a powerful RF tool, and performs fairly well in the systems that use global representations [2][10][12]. Since common kernels of SVM usually rely on the inner product or the L p norm in the input space, they are inappropriate in the RBIR systems [6][11] that use variable-length representations. To approach the issue, a new kernel is proposed. It is a generalization of Gaussian kernel with the Euclidean distance replaced by Earth Movers’ Distance (EMD) [7]. The remainder of the paper is organized as follows. In Section 2, the previous works related to SVM in CBIR are summarized. The basic elements of a region-based image retrieval system are explained in Section 3. In Section 4, a new kernel is introduced. Preliminary experimental results are given in Section 5. Finally, we conclude in Section 6. 2. SVM IN CBIR There are at least two criterions to characterize the application of SVM in CBIR. 2.1. One Class vs. Two Classes Given the relevance feedback information, generally two kinds of learning could be done in order to boost the performance. One is to estimate the distribution of the target images, while the other is to learn a boundary that separates the target images from the rest. For the former, the so-called one-class SVM was adopted [2]. A kernel II - 21 0-7803-7965-9/03/$17.00 ©2003 IEEE ICME 2003