A COLLABORATIVE BAYESIAN IMAGE RETRIEVAL FRAMEWORK
Rui Zhang, Ling Guan
Ryerson Multimedia Research Laboratory
Ryerson University, Toronto, Canada
{rzhang, lguan}@ee.ryerson.ca
ABSTRACT
In this paper, an image retrieval framework combining content-based
and content-free methods is proposed, which employs both short-
term relevance feedback (STRF) and long-term relevance feedback
(LTRF) as the means of user interaction. The STRF refers to iter-
ative query-specific model learning during a retrieval session, and
the LTRF is the estimation of a user history model from the past re-
trieval results approved by previous users. The framework is formu-
lated based on the Bayes’ theorem, in which the results from STRF
and LTRF play the roles of refining the likelihood and the a priori
information, respectively, and the images are ranked according to
the a posteriori probability. Since the estimation of the user history
model is based on the principle of collaborative filtering, the system
is referred to as a collaborative Bayesian image retrieval (CLBIR)
framework. To evaluate the effectiveness of the proposed frame-
work, nearest neighbor CLBIR (NN-CLBIR) and support vector ma-
chine active learning CLBIR (SVMAL-CLBIR) were implemented.
Experimental results showed the improvement over content-based
methods in terms of both accuracy and ranking due to the integra-
tion in the proposed framework.
Index Terms— image retrieval, Bayesian framework
1. INTRODUCTION
Ever-lasting growth of multimedia information has been witnessed
and experienced by human beings since the beginning of the infor-
mation era. An immediate challenge resulting from the information
explosion is how to intelligently manage and enjoy the multimedia
databases. Content-based image retrieval (CBIR) has been inten-
sively studied for more than a decade, yet still remaining a chal-
lenging topic [1]. Conventional CBIR systems exploiting global
low-level features have proven effective to the extent of pre-attentive
similarity due to the semantic gap. Noticing the critical role of hu-
man beings in recognizing semantic content in multimedia objects,
relevance feedback (RF) was applied to CBIR. Modern techniques
approach RF by approximating a function consistent with human vi-
sual perception [2–4], resulting in significant improvement. We refer
to these RF techniques as short-term relevance feedback (STRF) as
they are terminated once a user is satisfied by the results or gives up
the query. On the other hand, we believe that a successful retrieval
system should be capable of learning a history model of the vast
majority of the users from the past retrieval results since they con-
tain valuable semantic information which may improve the database-
wide semantic indexing. We refer to the technique of learning a user
history model as long-term relevance feedback (LTRF) because it
can be a life-long process involving human computer interaction.
In this paper, we propose a new image retrieval strategy, in
which the content-based and the content-free [5] methods are seam-
lessly integrated into a mathematically justifiable framework. User
interaction is carried out through the combination of STRF and
LTRF. We formulate the task based on the Bayes’ theorem, in which
the content-based similarity measure is considered as the likelihood
evaluation which can be updated using STRF and the probability
estimated using content-free approaches serves as the a priori in-
formation. The a posteriori probability is used to rank the images
in the database. For the likelihood evaluation, we adopted both
nearest-neighbor CBIR (NN-CBIR) and support vector machine ac-
tive learning CBIR (SVMAL-CBIR). As for the content-independent
component, we employed the MaxEnt-based CFIR. Numerical re-
sults demonstrated better performance than that of a simple content-
based system with only STRF. In addition, even if there is no user
history, the system can still function as the a priori distribution of
the images is just uniform, in which case, however, the CFIR fails
to work [6]. Since the a priori knowledge is extracted using a col-
laborative filtering technique, the proposed system is referred to as a
collaborative Bayesian image retrieval (CLBIR) framework.
2. THE PROPOSED FRAMEWORK
Let a query be represented using a vector xq , where xq ∈ R
d
. The
goal of the framework is to rank the candidate images using the the
a posteriori probability P (ω|xq , I ), where ω ∈ W is the index of
an image in a database, W = {1, 2,...,N }, N is the number of im-
ages, and I is the background information. According to the Bayes’
theorem, the a posteriori probability of an image given a query can
be written as
P (ω|xq , I ) ∝ p(xq |ω, I )P (ω|I ), (1)
with the equality replaced by the proportionality due to the fact that
the probability density function (PDF) of the observation xq is a
normalization constant given different ω. In the CLBIR framework,
I = {Iq,1,Iq,2,...,Iq,Q} is a set of the indexes of query images,
where Iq,i ∈ W , i =1, 2,...,Q, and Q is the number of query
images. When 1 <Q ≪ N , xq =
1
Q
∑
Q
i=1
xq,i , where xq,i ∈
R
d
is the feature vector of the query image Iq,i . According to the
interpretation of I , (1) can be simplified as
P (ω|xq , I ) ∝ p(xq |ω)P (ω|I ). (2)
Based on (2), the information utilized for ranking candidate im-
ages consists of the similarity measure based on visual content and
1953 978-1-4244-2354-5/09/$25.00 ©2009 IEEE ICASSP 2009