An Automatic Multi-Agent Web Image and Associated Keywords Retrieval System Nikolaos Papadakis Computer and Communication Engineering Department University of Thessaly Volos, Greece nkpap@telecom.ntua.gr Klimis Ntalianis Department of Telecommunications Science and Technology University of Peloponnese Tripoli, Greece kntal@image.ntua.gr Anastasios Doulamis Decision Support Lab. Technical University of Crete Chania, Greece adoulam@cs.ntua.gr George Stamoulis Computer and Communication Engineering Department University of Thessaly Volos, Greece george@inf.uth.gr Abstract— Web-based image search engines and CBIR techniques are blind to the actual content. As a result querying for a specific object is often cluttered with irrelevant data, leading to low precision. Furthermore recall rates are also very low since retrieval procedures are usually based either on context (surrounding text) and file captions or on low-level visual features. In this paper an automatic multi-agent image retrieval system is proposed. Our novel system exploits the format of multimedia sharing web sites to discover the underlying structure in order to finally infer and extract multimedia files and corresponding associated keywords from the web pages. The system first identifies the section of the web page that contains the multimedia file to be extracted and then extracts it by using clustering techniques and other tools of statistical origin. Experimental results on real-world image sharing web sites are presented and discussed in this paper, indicating the promising performance of the proposed system. Keywords- Multimedia retrieval; automatic wrapper; multi- resolution visualization; web mining; multi-agent web data extraction I. INTRODUCTION During the last decade a rapid increase in the size of digital image collections has been observed. As the computational power of both hardware and software and bandwidth have increased, the ability to store on the Web more complex data types has been significantly improved. These new media types demand a different treatment during search and retrieval than pure text. Towards this direction several Content Based Image Retrieval (CBIR) methods have been proposed [1], some of which are based on multiple agents [2], [3], [4]. However most methods are based on combination of low-level features, which usually cannot provide semantic information. On the other hand leading search engines such as Google and Yahoo retrieve web images by checking captions, the html page content and the surrounding text, information that may be irrelevant to the content of an image. Thus it becomes obvious that on the one hand it is extremely difficult to develop a generic method that works in every web page and on the other hand visual features lack semantics. To overcome these problems some wrapper based methods have been proposed. For example in [5] the user has to perform a sample query on a component called provider and then mark the important elements in the web pages, thus guiding the generation process of the wrapper. It also includes another component that addresses the problem of an eventual re-arrangement of the elements or simply the addition of some tags in the page. Another characteristic example includes the work in [6], which is based on two observations about data records on the Web and the use of a string matching algorithm. The first is that a group of data records containing descriptions of a set of similar objects are typically presented in a particular region of a page and are formatted using similar HTML tags. HTML tags of a page are regarded as strings, therefore a string matching algorithm is used to find similar HTML tags. The second observation is that a group of similar data records being placed in a specific region is reflected in the tag tree by the fact that they are under one parent node which must be found. In order to avoid human guidance and raw-tag manipulations in this paper we propose a multi-agent system that automatically segments web pages into structural tokens. The proposed system is successfully applied to web image sharing sites. In particular images are commonly presented in HTML pages, mostly structured, but this structure is not known in advance. The most obvious problem in designing a system for web image extraction is the lack of homogeneity in the structure of the source data found in web image sharing sites. In our case, managing this task is made somewhat easier by the fact that web image sharing sites do have some structure of their own. The image is presented in a part of a web page while its corresponding words are placed in another part. This sort of structure is exploited in this paper to derive the structure of the data. In particular a novel fully automated multi-agent scheme is presented that is able to segment a web page into structural tokens and select the tokens of interest (image and associated keywords). A key step towards retrieving the data of interest is to discover the sections contained in a web page and identify the ones holding the interesting information. To do that, our method is based on a 978-1-4244-4530-1/09/$25.00 ©2009 IEEE