I2R AT IMAGECLEF WIKIPEDIA RETRIEVAL 2010 Kong-Wah WAN, Yan-Tao ZHENG, Sujoy ROY Computer Vision and Image Understanding, Institute for Infocomm Research, 1 Fusionopolis Way, Singapore 138632 Abstract We report on our approaches and methods for the ImageCLEF 2010 Wikipedia image re- trieval task. A distinctive feature of this year’s image collection is that images are associated with unstructured and noisy textual annoations in three languages: English, French and Ger- man. Hence, besides following conventional text-based and multimodal approaches, we also focus some eﬀort into investigating multilingual methods. We submitted a total of six runs along the following three directions: 1. augmenting basic text-based indexing with feature selection (three runs), 2. multimodal retrieval that re-ranks text-based results using visual- near-duplicates (VND), (one run), and 3. multilingual fusion that combines results from the three language resources indexed separately (two runs). Our best result (i2rcviu MONOLIN- GUAL, MAP of 0.2126) comes from the latter multilingual fusion approach, indicating the promise of exploiting multilingual resources. For our multimodal re-ranking run, we adopt a pseudo-relevance-feedback approach that builds a visual prototype model of each query without the need for any labeled example images. Essentially, we assume that the top-ranked image results from a text baseline retrieval are correct, and proceed to re-rank the result list such that images that are visually similar images to the top-ranked images are pushed up the ranks. This VND-based re-ranking is applied on the results of a text baseline (RUN i2rcviu I2R.baseline, MAP of 0.1847) that indexed images using all available annotations. This visual re-ranking run (i2rcviu I2R.VISUAL.NDK) achieves a MAP of 0.1984, a 7% improvement. Led by this encouraging result, we apply our VND re-ranking on the results from the multilingual run, and obtain our best retrieval result (not submitted) of 0.2338. Keywords: Multimodal Retrieval, Visual re-ranking, Multilingual fusion 1 Introduction We present our approach and methods in the Wikipedia-MM task of ImageCLEF 2010 [9]. In this year, a key distinctive feature of the benchmark image collection is that images are annotated with unstructured and noisy text in three languages: English (EN), French (FR), and German (DE). Hence, apart from conventional text-based and multimodal (visual+text) approaches, we investigated into ways to exploit the multilingual nature of the image corpus. We submitted a total of six runs, focusing our eﬀort along the following three main directions. 1