Analysis of User Image descriptions and Automatic Image Indexing Vocabularies: An Exploratory Study Niranjan Balasubramanian, Anne R. Diekema, Abby A. Goodrum Center for Natural Language Processing / School of Information Studies Syracuse University {nbalasub,diekemar,aagoodru}@syr.edu Abstract This study explores the terms assigned by users to index, manage, and describe images and compares them to indexing terms derived automatically by systems for image retrieval. Results of this study indicate that user- derived indexing vocabulary largely reflects what users see in the image or what they perceive as the overall topic of an image. This is in contrast to system-derived indexing wherein terms are extracted from existing text surrounding the image. In many cases, the surrounding text does not describe the image, rather, the image is used to illustrate or expand upon the text. System- derived vocabulary may describe higher level concepts, for example, industrial pollution rather than smoke. The paper concludes with suggestions for the use of natural language processing techniques to provide vocabulary alignment in image retrieval. 1. Introduction In spite of the increasing availability and sophistication of content based image retrieval tools, (i.e., QBIC, Blobworld) users most often employ text to initiate image searches, (Goodrum, Bejune, & Siochi 2003) and the primary mechanism for image retrieval on the web is still the matching of terms in queries to terms accompanying images. The manual assignment of textual keywords can be time consuming and inconsistent however (Markey, 1984), and humans describe images differently given different tasks. Moreover, it is easier and less costly for web-based search tools (i.e., Google, WebSeek) to create their image indices automatically by exploiting the text accompanying images on webpages such as captions, filenames and surrounding document text. A central question to address in this practice is whether the terms users create to describe images and image needs corresponds well to the terms extracted automatically by search engines for retrieval. This paper reports results from a study exploring the overlap between terms generated by users to describe images, and the terms available for automatic extraction by systems for image retrieval on the web. 2. Background / Related work A number of studies have explored the multiple issues surrounding human image indexing and description from a variety of perspectives. Markey’s landmark 1984 study of interindexer consistency and overlap set the stage for our understanding of the difficulties and costs associated with manual image indexing. Enser (1993) and Goodrum & Spink (1999) examined image requests and demonstrated the wide diversity of terms generated by users to describe image needs as well as term mismatch between users and indexers. Jorgensen’s (2003) and O’Connor et al’s (1999) studies have investigated image attributes typically described by diverse, "naïve" participants in several types of tasks across a range of pictorial images. Additionally, there have been several studies exploring the use of existing text associated with images for automatic image indexing by retrieval systems. Rohini et al (2000) and Pastra et al (2003) have demonstrated the use of text in image captions to improve automatic image indexing and retrieval performance. Although it appears that automatic indexing of images by systems extracting existing text offers a great deal of promise, there has been scant research exploring the intersection between human assignment of indexing terms and image description for images overlapping with system-generated indexing for image retrieval. This research is crucial to our understanding of retrieval effectiveness and understanding of possible gaps that might be bridged by NLP techniques.