Object-Based Image Retrieval Using Active Nets David Garc´ ıa-P´ erez, Antonio Mosquera Departamento de Electr´ onica y Computaci´ on Universidad de Santiago de Compostela davidgp@usc.es, mosquera@dec.usc.es Stefano Berretti, Alberto Del Bimbo Dipartimento di Sistemi e Informatica University of Firenze {berretti, delbimbo}@dsi.uniﬁ.it Abstract In this work, extraction of relevant objects from images and their matching for retrieval is proposed. Objects are represented by using a two dimensional deformable struc- ture referred to as active net, capable to adapt to relevant image regions according to chromatic and edge informa- tion. In particular, this representation allows a joint de- scription of color, shape and structural information of ex- tracted objects. A similarity measure between active nets is also deﬁned and validated in a set of retrieval experiments on the ETH-80 objects database. 1 Introduction Effective access to modern archives of digital images requires that conventional searching techniques based on textual keywords be extended by content-based queries ad- dressing visual features of searched data. To this end, a number of models have been experimented which permit to represent and compare images in terms of quantitative in- dexes of visual features. In particular, different techniques have been identiﬁed and experimented to represent content of single images according to low-level features, such as color, texture, shape and structure, intermediate-level fea- tures of saliency and spatial relationships, or high-level traits modeling the semantics of image content [4, 9]. In so doing, extracted features may either refer to the overall image (e.g., a color histogram), or to any subset of pixels constituting a spatial entity with some visual cohesion in the user perception (e.g., an object). Among these approaches, image representations based on chromatic indexes have been largely used for general purpose image retrieval systems [5]. However, such ap- proaches are not appropriate for precise retrieval, account- ing for perceptual details in the image. More suited to this end are region based solutions. In fact, recently much re- search has focused on region based techniques that allow the user to specify a particular region of an image and search for images containing similar regions. However, most exist- ing region or object-based systems rely on color segmenta- tion [3], causing these systems to fail when accurate seg- mentation is not possible. As opposed to color information, other retrieval schemes are entirely based on shape con- tent. Most of the work on region-shape recognition relies on matching sets of local image features (e.g., edges, lines and corners), usually through statistical analysis which disre- gard relational information among extracted features. Most of these methods have been proved to be adequate only for simple, ﬂat and man-made objects. Only few approaches have tried to conjugate color and shape information to improve the signiﬁcance of object rep- resentations. For example, in [7], color and shape invariants are combined into a histogram based representation which basically relies on the invariant properties of sets of points in the image. However, many of the current solutions suffer from the difﬁculty in extracting effective object representa- tions capable to jointly capture color, shape and structural information of image objects. In this paper, we propose a descriptor modeled through a graph which accounts for structural elements and color of regions (objects) of interest in an image. This graph di- rectly corresponds to an elastic structure (active net) which through a deformation process is used to separate regions from the background. In particular, due to their deformable structure, active nets can adapt to the borders and internal part of a region encoding, at the same time, information on color, shape and spatial structure of the region. The use of an illumination invariant color space for the detection of region borders, makes the model partially robust also to changes in illumination conditions. Once the net is adapted to an object it is transformed to a graph accounting for the region chromatic content of the nodes, and for their relative distance and spatial position. Finally, based on a similar- ity measure deﬁned between graph representations, graph models are compared to support region-based retrieval. The rest of the paper is organized in three Sections and a Conclusion. In Sect.2, the active net model is deﬁned. In particular, the dynamic adaptation of active nets to relevant image objects based on color and edge image information is discussed. In Sect.3, the model is cast to a graph represen- tation in order to support effective and efﬁcient comparison between two nets. Retrieval experiments on the ETH-80 0-7695-2521-0/06/$20.00 (c) 2006 IEEE