A Compact and Efficient Image Retrieval Approach Based on Border/Interior Pixel Classification Renato O. Stehling ∗ Institute of Computing University of Campinas Brazil renato.stehling@ic.unicamp.br Mario A. Nascimento † Dept. of Computing Science University of Alberta Canada mn@cs.ualberta.ca Alexandre X. Falc˜ ao ‡ Institute of Computing University of Campinas Brazil afalcao@ic.unicamp.br ABSTRACT This paper presents BIC (Border/Interior pixel Classifica- tion), a compact and efficient CBIR approach suitable for broad image domains. It has three main components: (1) a simple and powerful image analysis algorithm that classifies image pixels as either border or interior, (2) a new loga- rithmic distance (dLog) for comparing histograms, and (3) a compact representation for the visual features extracted from images. Experimental results show that the BIC ap- proach is consistently more compact, more efficient and more effective than state-of-the-art CBIR approaches based on so- phisticated image analysis algorithms and complex distance functions. It was also observed that the dLog distance func- tion has two main advantages over vectorial distances (e.g., L1): (1) it is able to increase substantially the effectiveness of (several) histogram-based CBIR approaches and, at the same time, (2) it reduces by 50% the space requirement to represent a histogram. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications— image databases ; H.3.1 [Information Storage and Re- trieval]: Content Analysis and Indexing—abstracting meth- ods, indexing methods . General Terms Measurement, Algorithms, Experimentation. Keywords Content-Based Image Retrieval, CBIR, Distance Function, Color Histogram, Image Analysis ∗ Supported by FAPESP, Brazil. † Partially supported by NSERC, Canada. ‡ Partially supported by CNPq, Brazil. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’02, November 4–9, 2002, McLean, Virginia, USA. Copyright 2002 ACM 1-58113-492-4/02/0011 ...$5.00. 1. INTRODUCTION Recently, there has been a great interest in techniques for Content-Based Image Retrieval – CBIR. This interest has spurred from the need to efficiently manage and search large volumes of multimedia information, mostly due to the exponential growth of the World-Wide-Web (WWW). CBIR is performed based on abstract descriptions of the images that are extracted during the image analysis phase. Image analysis algorithms may depend on the properties of the images being analyzed. These algorithms are usually distinct for different image domains, and gradually change when the focus moves from a narrow to a broad image do- main [12]. A narrow image domain has a limited and pre- dictable variability in all relevant aspects of its appearance. Collections of fingerprints, faces recorded over a clear back- ground and X-rays of the human brain are examples of nar- row image domains. A broad image domain, on the other hand, has an unlimited and unpredictable variability of the image’s content. In general, the interpretation of the im- age’s content is not unique, and the collection of images is very large. As a consequence, it is not possible to use semi-automatic techniques and domain-dependent knowl- edge during the analysis and comparison of images. The broadest collection of images nowadays is likely formed by the very large amount of images available at the WWW. In this paper, our focus is on CBIR techniques suitable for broad image domains. In this scenario, low-level visual fea- tures of the images such as color and texture are especially useful to represent and to compare images automatically. In fact, color is the most commonly used low-level feature in CBIR systems. Color-based image retrieval techniques can be classified into three main categories: (1) global ap- proaches [2, 6], (2) partition-based approaches (e.g. [11, 16]) and (3) regional approaches (e.g. [5, 13]). Each of these cat- egories poses a distinct compromise among the complexity of the image analysis algorithm, the amount of space required to represent the visual features extracted from images, the complexity of the distance function used to compare these features, and the retrieval effectiveness [15]. Global approaches describe the visual content of an im- age as a whole without spatial or topological information. Partition-based approaches introduce some spatial informa- tion about the visual content of the images decomposing them in spatial cells according to a fixed scheme, and de- scribing the content of each cell individually. Regional ap- proaches are a natural evolution of partition-based approach- es in the sense that, instead of decomposing images in a 102