COLLABORATIVE IMAGE ANNOTATION USING IMAGE WEBS Zixuan Wang 1,* , Omprakash Gnawali 2 , Kyle Heath 1 , and Leonidas J. Guibas 2 1 Department of Electrical Engineering, Stanford University, Stanford, CA, 94305 2 Department of Computer Science, Stanford University, Stanford, CA, 94305 ABSTRACT The widespread availability of hand-held de- vices equipped with cameras has facilitated the cre- ation of massive image collections. Our method links together image regions containing instances of the same object to form a graph called an Im- age Web. Such graphs represent relationships be- tween images based on shared visual content. We demonstrate how to use Image Webs as conduits for symbolic information propagation among im- ages. Symbolic information include annotations provided by users who perhaps have special exper- tise or were close to where the sensor data was cap- tured. Such annotations can then be propagated to related images of the same object and benefit other users. Our algorithm gives similarity weights to edges in the Image Web graph. These weights are used to attenuate the relevance of annotations as they propagate along edges of the graph. Ex- periments show that our system supports multiple users to share images and annotate images collab- oratively fast and accurately. 1. INTRODUCTION The widespread availability of hand-held de- vices equipped with cameras has facilitated the cre- ation of massive image collections. These massive data sets have remained a largely untapped re- source of information because of the difficulty of automatically discovering useful structure in the image content. We build upon the work in [Heath et al. , 2010] which proposed building graphs called Image Webs to represent relationship between im- ages in a collection induced by shared objects. Each node in an Image Web corresponds to a re- gion in an image that can be matched to a region in another image. Edges in the Image Web connect re- gions in different images that are similar under an affine transform. These regions are extracted us- ing a process called Affine Co-segmentation to pro- duce match links between images. The graph also has edges connecting distinct regions that occur in the same image. We present a system that lever- ages this graph structure to enable users to shares semantic information about objects in large collec- tions of images. Fig. 1 shows images are connected from a front to a back view through intermediate ones. Front View Back View Fig. 1: An example of Image Webs Such a collaborative annotation system can be extremely useful in the military context. When a soldier takes images of incidents and objects of in- terest, the soldier can be automatically notified of all the relevant information about the objects in such images. The notification can include informa- tion about the object provided by the experts or another soldier who saw the object earlier during a patrol. We automate the inference of relation between objects, even across the images, and prop- agate the relevant information about the objects using those relations between the objects. The focus of this paper is to utilize the densely linked image collections to propagate symbolic in- formation such as annotations. This work has three main contributions. First, Image Webs are built in- crementally as users add new images to the server. This is different from [Heath et al. , 2010], which builds Image Webs over a large scale image col- 1