Pattern Recognition 42 (2009) 218--228 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/pr Image annotation via graph learning Jing Liu a, ∗ , Mingjing Li b , Qingshan Liu a , Hanqing Lu a , Songde Ma a a Institute of Automation Chinese Academy of Sciences, No. 95, Zhongguancun East Road, Beijing 100080, China b Microsoft Research Asia, Beijing 100080, China ARTICLE INFO ABSTRACT Article history: Received 15 December 2007 Received in revised form 21 March 2008 Accepted 16 April 2008 Keywords: Graph learning Image annotation Image similarity Word correlation Image annotation has been an active research topic in recent years due to its potential impact on both image understanding and web image search. In this paper, we propose a graph learning framework for image annotation. First, the image-based graph learning is performed to obtain the candidate annota- tions for each image. In order to capture the complex distribution of image data, we propose a Nearest Spanning Chain (NSC) method to construct the image-based graph, whose edge-weights are derived from the chain-wise statistical information instead of the traditional pairwise similarities. Second, the word-based graph learning is developed to refine the relationships between images and words to get final annotations for each image. To enrich the representation of the word-based graph, we design two types of word correlations based on web search results besides the word co-occurrence in the training set. The effectiveness of the proposed solution is demonstrated from the experiments on the Corel dataset and a web image dataset. © 2008 Elsevier Ltd. All rights reserved. 1. Introduction With the advent of digital imagery, the number of digital images has been growing rapidly and there is an increasing requirement on indexing and searching these images effectively. Systems using non- textual (image) queries have been proposed but many users found it hard to represent their queries using abstract image features. Most users prefer textual queries, i.e., keyword-based image search, which is typically achieved by manually providing image annotations and searching over these annotations using a textual query. However, manual annotation is an expensive and tedious procedure. Thus, automatic image annotation is necessary for efficient image retrieval. Generally, image annotation methods aim to learn the semantics of untagged images from annotated images according to image sim- ilarities. Its probabilistic interpretation is to find a set of keywords w ∗ (in a given vocabulary V) that maximize the joint probability of P(w, I q ) as follows 1 : w ∗ = arg max w⊂V P(w, I q ) = arg max w⊂V  I i ∈T P(w|I i )P(I q |I i )P(I i ) (1) ∗ Corresponding author. Tel.: +86 1062542971. E-mail address: liujingmgm@gmail.com (J. Liu). 1 Here we assume that the events of observing words w and image Iq are mutually independent once we pick the training image Ii . 0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.04.012 where I q is an untagged image, T is a set of annotated images, P(I q |I i ) denotes the probability that I i is relevant (or similar) to I q , and P(w|I i ) represents the likelihood that I i can be annotated with w. From the formulation, a basic image annotation system consists of two relations: image-to-image relation (IIR) and image-to-word relation (IWR). Typically, IIR is built with visual features, which is available given a dataset. IWR indicates the likelihood of a word given an image, which is a word model learned from annotated images. Given IIR and IWR, annotations of the untagged images can be achieved by the similarity propagation. In addition, word-to-word relation (WWR) can be explored to refine the annotations so as to maintain the semantic consistence among them. Then, we formulate the problem of image annotation into a graph learning framework, which includes the image-based graph learning and the word-based graph learning. The image-based graph learning is first performed to learn the relationships between images and words, i.e., to obtain the candidate annotations for each image, and then the word-based graph learning is used to refine the obtained relationships by exploring word correlation. How to build a similarity graph is very important in graph learn- ing. A good graph should reflect a deep understanding of the data structure and help to mine potential knowledge as much as possi- ble. In previous studies, the graph is often constructed by k-NN or -ball-based pairwise similarities. However, these pairwise relations may not satisfy the requirements of image annotation on the large image database, due to the limited domain knowledge and the com- plex distribution of image data. In order to better capture the com- plex distribution of image data, in this paper, we propose a nearest