A COLLABORATIVE BAYESIAN IMAGE RETRIEVAL FRAMEWORK Rui Zhang, Ling Guan Ryerson Multimedia Research Laboratory Ryerson University, Toronto, Canada {rzhang, lguan}@ee.ryerson.ca ABSTRACT In this paper, an image retrieval framework combining content-based and content-free methods is proposed, which employs both short- term relevance feedback (STRF) and long-term relevance feedback (LTRF) as the means of user interaction. The STRF refers to iter- ative query-speciﬁc model learning during a retrieval session, and the LTRF is the estimation of a user history model from the past re- trieval results approved by previous users. The framework is formu- lated based on the Bayes’ theorem, in which the results from STRF and LTRF play the roles of reﬁning the likelihood and the a priori information, respectively, and the images are ranked according to the a posteriori probability. Since the estimation of the user history model is based on the principle of collaborative ﬁltering, the system is referred to as a collaborative Bayesian image retrieval (CLBIR) framework. To evaluate the effectiveness of the proposed frame- work, nearest neighbor CLBIR (NN-CLBIR) and support vector ma- chine active learning CLBIR (SVMAL-CLBIR) were implemented. Experimental results showed the improvement over content-based methods in terms of both accuracy and ranking due to the integra- tion in the proposed framework. Index Terms— image retrieval, Bayesian framework 1. INTRODUCTION Ever-lasting growth of multimedia information has been witnessed and experienced by human beings since the beginning of the infor- mation era. An immediate challenge resulting from the information explosion is how to intelligently manage and enjoy the multimedia databases. Content-based image retrieval (CBIR) has been inten- sively studied for more than a decade, yet still remaining a chal- lenging topic [1]. Conventional CBIR systems exploiting global low-level features have proven effective to the extent of pre-attentive similarity due to the semantic gap. Noticing the critical role of hu- man beings in recognizing semantic content in multimedia objects, relevance feedback (RF) was applied to CBIR. Modern techniques approach RF by approximating a function consistent with human vi- sual perception [2–4], resulting in signiﬁcant improvement. We refer to these RF techniques as short-term relevance feedback (STRF) as they are terminated once a user is satisﬁed by the results or gives up the query. On the other hand, we believe that a successful retrieval system should be capable of learning a history model of the vast majority of the users from the past retrieval results since they con- tain valuable semantic information which may improve the database- wide semantic indexing. We refer to the technique of learning a user history model as long-term relevance feedback (LTRF) because it can be a life-long process involving human computer interaction. In this paper, we propose a new image retrieval strategy, in which the content-based and the content-free [5] methods are seam- lessly integrated into a mathematically justiﬁable framework. User interaction is carried out through the combination of STRF and LTRF. We formulate the task based on the Bayes’ theorem, in which the content-based similarity measure is considered as the likelihood evaluation which can be updated using STRF and the probability estimated using content-free approaches serves as the a priori in- formation. The a posteriori probability is used to rank the images in the database. For the likelihood evaluation, we adopted both nearest-neighbor CBIR (NN-CBIR) and support vector machine ac- tive learning CBIR (SVMAL-CBIR). As for the content-independent component, we employed the MaxEnt-based CFIR. Numerical re- sults demonstrated better performance than that of a simple content- based system with only STRF. In addition, even if there is no user history, the system can still function as the a priori distribution of the images is just uniform, in which case, however, the CFIR fails to work [6]. Since the a priori knowledge is extracted using a col- laborative ﬁltering technique, the proposed system is referred to as a collaborative Bayesian image retrieval (CLBIR) framework. 2. THE PROPOSED FRAMEWORK Let a query be represented using a vector xq , where xq ∈ R d . The goal of the framework is to rank the candidate images using the the a posteriori probability P (ω|xq , I ), where ω ∈ W is the index of an image in a database, W = {1, 2,...,N }, N is the number of im- ages, and I is the background information. According to the Bayes’ theorem, the a posteriori probability of an image given a query can be written as P (ω|xq , I ) ∝ p(xq |ω, I )P (ω|I ), (1) with the equality replaced by the proportionality due to the fact that the probability density function (PDF) of the observation xq is a normalization constant given different ω. In the CLBIR framework, I = {Iq,1,Iq,2,...,Iq,Q} is a set of the indexes of query images, where Iq,i ∈ W , i =1, 2,...,Q, and Q is the number of query images. When 1 <Q ≪ N , xq = 1 Q ∑ Q i=1 xq,i , where xq,i ∈ R d is the feature vector of the query image Iq,i . According to the interpretation of I , (1) can be simpliﬁed as P (ω|xq , I ) ∝ p(xq |ω)P (ω|I ). (2) Based on (2), the information utilized for ranking candidate im- ages consists of the similarity measure based on visual content and 1953 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009