Pattern Recognition 42 (2009) 218--228
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/pr
Image annotation via graph learning
Jing Liu
a, ∗
, Mingjing Li
b
, Qingshan Liu
a
, Hanqing Lu
a
, Songde Ma
a
a
Institute of Automation Chinese Academy of Sciences, No. 95, Zhongguancun East Road, Beijing 100080, China
b
Microsoft Research Asia, Beijing 100080, China
ARTICLE INFO ABSTRACT
Article history:
Received 15 December 2007
Received in revised form 21 March 2008
Accepted 16 April 2008
Keywords:
Graph learning
Image annotation
Image similarity
Word correlation
Image annotation has been an active research topic in recent years due to its potential impact on both
image understanding and web image search. In this paper, we propose a graph learning framework for
image annotation. First, the image-based graph learning is performed to obtain the candidate annota-
tions for each image. In order to capture the complex distribution of image data, we propose a Nearest
Spanning Chain (NSC) method to construct the image-based graph, whose edge-weights are derived
from the chain-wise statistical information instead of the traditional pairwise similarities. Second, the
word-based graph learning is developed to refine the relationships between images and words to get
final annotations for each image. To enrich the representation of the word-based graph, we design two
types of word correlations based on web search results besides the word co-occurrence in the training
set. The effectiveness of the proposed solution is demonstrated from the experiments on the Corel dataset
and a web image dataset.
© 2008 Elsevier Ltd. All rights reserved.
1. Introduction
With the advent of digital imagery, the number of digital images
has been growing rapidly and there is an increasing requirement on
indexing and searching these images effectively. Systems using non-
textual (image) queries have been proposed but many users found it
hard to represent their queries using abstract image features. Most
users prefer textual queries, i.e., keyword-based image search, which
is typically achieved by manually providing image annotations and
searching over these annotations using a textual query. However,
manual annotation is an expensive and tedious procedure. Thus,
automatic image annotation is necessary for efficient image retrieval.
Generally, image annotation methods aim to learn the semantics
of untagged images from annotated images according to image sim-
ilarities. Its probabilistic interpretation is to find a set of keywords
w
∗
(in a given vocabulary V) that maximize the joint probability of
P(w, I
q
) as follows
1
:
w
∗
= arg max
w⊂V
P(w, I
q
)
= arg max
w⊂V
I
i
∈T
P(w|I
i
)P(I
q
|I
i
)P(I
i
) (1)
∗
Corresponding author. Tel.: +86 1062542971.
E-mail address: liujingmgm@gmail.com (J. Liu).
1
Here we assume that the events of observing words w and image Iq are mutually
independent once we pick the training image Ii .
0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2008.04.012
where I
q
is an untagged image, T is a set of annotated images, P(I
q
|I
i
)
denotes the probability that I
i
is relevant (or similar) to I
q
, and P(w|I
i
)
represents the likelihood that I
i
can be annotated with w.
From the formulation, a basic image annotation system consists
of two relations: image-to-image relation (IIR) and image-to-word
relation (IWR). Typically, IIR is built with visual features, which is
available given a dataset. IWR indicates the likelihood of a word
given an image, which is a word model learned from annotated
images. Given IIR and IWR, annotations of the untagged images can
be achieved by the similarity propagation. In addition, word-to-word
relation (WWR) can be explored to refine the annotations so as to
maintain the semantic consistence among them.
Then, we formulate the problem of image annotation into a graph
learning framework, which includes the image-based graph learning
and the word-based graph learning. The image-based graph learning
is first performed to learn the relationships between images and
words, i.e., to obtain the candidate annotations for each image, and
then the word-based graph learning is used to refine the obtained
relationships by exploring word correlation.
How to build a similarity graph is very important in graph learn-
ing. A good graph should reflect a deep understanding of the data
structure and help to mine potential knowledge as much as possi-
ble. In previous studies, the graph is often constructed by k-NN or
-ball-based pairwise similarities. However, these pairwise relations
may not satisfy the requirements of image annotation on the large
image database, due to the limited domain knowledge and the com-
plex distribution of image data. In order to better capture the com-
plex distribution of image data, in this paper, we propose a nearest