The joint submission of the TU Berlin and Fraunhofer FIRST (TUBFI) to the ImageCLEF2011 Photo Annotation Task Alexander Binder 1 , Wojciech Samek 1,2 , Marius Kloft 1 , Christina M ¨ uller 1 , Klaus-Robert M ¨ uller 1 , and Motoaki Kawanabe 2,1 1 Machine Learning Group, Berlin Institute of Technology (TU Berlin), Franklinstr. 28/29, 10587, Berlin, Germany, www.ml.tu-berlin.de alexander.binder@tu-berlin.de, wojciech.samek.tu-berlin.de 2 Fraunhofer Institute FIRST, Kekul´ estr. 7, 12489 Berlin, Germany motoaki.kawanabe@first.fraunhofer.de Abstract. In this paper we present details on the joint submission of TU Berlin and Fraunhofer FIRST to the ImageCLEF 2011 Photo Annotation Task. We sought to experiment with extensions of Bag-of-Words (BoW) models at several levels and to apply several kernel-based learning methods recently developed in our group. For classiﬁer training we used non-sparse multiple kernel learning (MKL) and an efﬁcient multi-task learning (MTL) heuristic based on MKL over kernels from classiﬁer outputs. For the multi-modal fusion we used a smoothing method on tag-based features inspired by Bag-of-Words soft mappings and Markov ran- dom walks. We submitted one multi-modal run extended by the user tags and four purely visual runs based on Bag-of-Words models. Our best visual result which used the MTL method was ranked ﬁrst according to mean average preci- sion (MAP) within the purely visual submissions. Our multi-modal submission achieved the ﬁrst rank by MAP among the multi-modal submissions and the best MAP among all submissions. Submissions by other groups such as BPACAD, CAEN, UvA-ISIS, LIRIS were ranked closely. Keywords: ImageCLEF, Photo Annotation, Image Classiﬁcation, Bag-of-Words, Multi-Task Learning, Multiple Kernel Learning, THESEUS 1 Introduction Our goals were to experiment with extensions of Bag-of-Words (BoW) models at sev- eral levels and to combine them with several kernel-based learning methods recently developed in our group while working within the THESEUS project. For this purpose we generated a submission to the annotation task of the ImageCLEF2011 Photo An- notation Challenge [14]. This task required the annotation of 10000 images in the pro- vided test corpus according to the 99 pre-deﬁned categories. Note that this year’s Im- ageCLEF Photo-based task provides additionally another challenging competition [14], a concept-based retrieval task. In the following we will focus on the ﬁrstly mentioned annotation task over the 10000 images. The ImageCLEF photo corpus is challenging