LONG TERM LEARNING FOR IMAGE RETRIEVAL OVER NETWORKS David Picard, Arnaud Revel ETIS ENSEA/UCP/CNRS UMR 8051 6 avenue du Ponceau 95014 Cergy-Pontoise Cedex FRANCE Matthieu Cord LIP6 UPMC Paris 6 104 avenue du Prsident Kennedy 75016 Paris FRANCE ABSTRACT In this paper, we present a long term learning system for con- tent based image retrieval over a network. Relevant feedback is used among different sessions to learn both the similarity function and the best routing for the searched category. Our system is based on mobile agents crawling the network in search of relevant images. An ant-behavior algorithm is used to learn the category dependent routing. With experiments on trecvid’05 key-frame dataset, we show that the smart associ- ation of category dependent routing and active learning leads to an improvement of the quality of the retrieval. 1. INTRODUCTION Thanks to the generalization of multimedia devices (such as mobile phones, digital cameras, etc...), huge collections of digital images are available today. Content Based Image Re- trieval (CBIR) has been successfully proposed to tackle the search in these ever growing collections [1]. The main idea is to build a description based on the images content, and to find similarities between descriptions [2]. Machine learning techniques have been successfully adapted to train a similarity function in interaction with the user (using her labeling of the results) leading to the so called “relevance feedback” [3, 4]. The best improvement has been done with the introduction of active learning, which aims at proposing for labeling the image that will at most enhance the similarity function when added to the training set [5]. With the expansion of networks such as the Internet, peer- to-peer networks or even personal networks, image retrieval has become a difficult task. As images are split into many collections over the web, the problem of CBIR is not only to find the most relevant images, but also to find the localization of relevant collections. Although CBIR in a distributed con- text has been noted as a interesting improvement [6], it has been, to our knowledge, the focus of few works. We have presented in a previous work [7] a system built for CBIR over networks. We carried out a smart cooperation between the interactive CBIR and a localization learning in a global archi- tecture based on mobile agents. However, all the labels gathered during the interaction are forgotten at the end of the session of a classical CBIR system. One might think about re-using these labels for later session in order to benefit from the previous active learning [8]. In a single-user CBIR system, the resulting long-term learning is very slow due to the few labels available. However, in our distributed context, we can gather labels over sessions from many users. In that sense, the knowledge given to the sys- tem through the relevance feedback is huge. In this paper, we present a generalization for long-term optimization of our previous CBIR over networks system. The localizations of the categories are learned over several sessions, enabling a routing of the mobile agents specific to the searched concept. In the next section, an overview of our system is exposed. The sect. 3 contains the description of the routing learning algorithm during a session. Section 4 describes the long-term optimization. Finally, we present and discuss the experiments and results we obtained using our system on the trecvid2005 key-frame dataset 1 in sect. 5. 2. RETRIEVAL SCHEME Our system is based on mobile agent technology. A mobile agent is an autonomous computer software with the ability to migrate from one computer to another and to continue its ex- ecution there. There are good reasons for using mobile agents in the distributed CBIR context, such as the reduction of the network load (the processing code of the agent being very small in comparison to the feature vector indexes) and the massive paralleling of the computation [9]. As described in Fig. 1, the user starts his query by giv- ing an example or a set of examples to an interface (1). A similarity function based on these examples is built (2). Mo- bile agents are then launched with a copy of this similarity function (3). Every host of the network contains an agent platform in order to be able to receive and execute incoming mobile agents. The agent movements are influenced by mark- ers (a numerical value locally stored on the host) following an ant-like behavior [10, 11], as described in section 3 On each platform, an agent indexing the local images is run, and retrieves the relevant images for the incoming agents. 1 see http://www-nlpir.nist.gov/projects/tv2005/