Interactive document retrieval and classification Ernest Valveny, Oriol Ramos, Joan Mas and Marc ¸al Rossinyol Abstract In this chapter we describe a system for document retrieval and classifi- cation following the interactive-predictive framework. In particular, the system ad- dresses two different scenarios of document analysis: document classification based on visual appearance and logo detection. These two classical problems of document analysis are formulated following the interactive-predictive model, taking the user interaction into account to make easier the process of annotating and labelling the documents. A system implementing this model in a real scenario is presented and analyzed. This system also takes advantage of active learning techniques to speed up the task of labelling the documents. 1 Introduction Huge amounts of documents are being stored currently as digital images at private and public organizations. However, for these raw digital images to be really use- ful, they need to be annotated with informative content. Document Image Analysis and Pattern Recognition techniques are at the heart of current solutions to this prob- lem. However, when dealing with difficult unconstrained documents (see figure 1), standard solutions (for instance, commercial OCR products) are simply not usable since, in the vast majority of these documents, elements can by no means be iso- lated automatically. Given the high error rates involved in post-editing solutions, only semi-automatic or computer-assisted alternatives can be currently foreseen. In this context, interactive tools emerge as a very appealing alternative to reduce the cost of labelling and annotating documents and, at the same time as a way of ob- taining user feedback to improve the model for classification and retrieval. Hence, in this chapter we describe an interactive tool to annotate documents with semantic information, such as the category of the document or the location of relevant ele- ments of the document which are difficult to automatically isolate. This tool follows Computer Vision Center, Dept. Ci` encies Computaci ´ o, Universitat Aut` onoma de Barcelona 1