Approximate Image Color Correlograms Claudio Taranto Department of Computer Science University of Bari "Aldo Moro" via E. Orabona, 4 - Bari, Italy claudio.taranto@di.uniba.it Nicola Di Mauro Department of Computer Science University of Bari "Aldo Moro" via E. Orabona, 4 - Bari, Italy ndm@di.uniba.it Stefano Ferilli Department of Computer Science University of Bari "Aldo Moro" via E. Orabona, 4 - Bari, Italy ferilli@di.uniba.it Floriana Esposito Department of Computer Science University of Bari "Aldo Moro" via E. Orabona, 4 - Bari, Italy esposito@di.uniba.it ABSTRACT The recent explosion in Internet usage and the growing am- ount of digital images caused by the more and more ubiq- uitous presence of digital cameras has created a demand for effective and flexible techniques for automatic image re- trieval. As the volume of the data increases, memory and processing requirements need to correspondingly increase at the same rapid pace, and this is often prohibitively expen- sive. Image collections on this scale make performing even the most common and simple image processing and machine learning tasks non trivial. In this paper we present a method to reduce the computational complexity of a widely known method for image indexing and retrieval based on a second order statistical measure. The aim of the paper is twofold: Q1) is it possible to efficiently extract an approximate dis- tribution of the image features with a resulting low error? Q2) how the resulting approximate distribution affects the similarity-based accuracy? In particular, we propose a sam- pling method to approximate the distribution of correlo- grams, adopting a Monte Carlo approach to compute the distribution on a subset of pixels uniformly sampled from the original image. A further variant is to sample the neigh- borhood of each pixel too. Validation on the Caltech 101 dataset proved that the proposed approximate distribution, obtained with a considerable decrease of the computational time, has an error very low when compared to the exact dis- tribution. Result obtained in the second experiment on a similarity-based ranking task are encouraging. Categories and Subject Descriptors I.4 [Computing Methodologies]: Image Processing And Computer Vision—Feature Measurement Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MM’10, October 25–29, 2010, Firenze, Italy. Copyright 2010 ACM 978-1-60558-933-6/10/10 ...$10.00. General Terms Experimental, Performance Keywords Color correlograms, Feature Extraction and Representation 1. INTRODUCTION The rapid expansion of digital image libraries has encour- aged the development of systems for content-based image retrieval based on indexing and querying engines [7]. In or- der to index an image the common approach is to extract low level information, such as pixel color, intensity, texture and shape, which could be used to make a feature vector representing the image. This paper addresses the problem of image feature vector computation. To date, the feature vector is calculated analysing each pixel belonging to the image, and thus involving both a huge computational time and a long indexing time. Our approach is to use a method to efficiently compute the image feature vector. We apply a Monte Carlo method to approximate the indexing, analysing a subset of the pixels belonging to an image. We prove that the obtained error is not significantly relevant when com- pared to the case of considering all the pixels. The first approach to calculate feature vectors is based on the color histogram [4, 6, 5]. This method is strongly based on pixel colors and it describes for each color level the number of corresponding pixels. For this reason, the image is usually converted in a color space (such as RGB or HSV). The major limit of histogram-based methods is that they only capture global information. It is possible that two semantically different images may correspond to a very similar histogram. An improved statistics is the joint his- togram [3] that includes not only color information, but also some other features, such as edge, texture, brightness. This method is more accurate than histogram but it presents the same problem of characterising an image with global infor- mation only. In [2], a new approach, named color correl- ogram, combining both global and local image information has been presented. This new statistics describes how pixels with a given color are spatially distributed in the image. The