IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 1 Distinct Multi-Colored Region Descriptors for Object Recognition Sarif Kumar Naik and C. A. Murthy Abstract— The problem of object recognition has been considered here. Color descriptions from distinct regions covering multiple segments are considered for object representation. Distinct multi-colored regions are detected using edge maps and clustering. Performance of the proposed methodologies has been evaluated on three datasets and the results are found to be better than existing methods when a small number of training views is considered. Index Terms—Object representation, Object descriptor, Object recog- nition, Object matching, Image representation I. I NTRODUCTION The challenges involved in object recognition are mainly the efficient representation and then the comparison of two objects through their representations. Broadly speaking, there are two types of approaches to object representation. While the first utilizes the knowledge gained from the spatial arrangements of the “shape features” such as the edge elements, boundaries, corners and junc- tions, the other uses the brightness or color features obtained more directly from the object images [1]. But, there are limitations to any algorithm which uses only either shape features or color features. The representation scheme should carry the color information and its pattern of appearance on the object surface. This study proposes a scheme to describe an object in such a way that the description contains the color information as well as the patterns of colors on the object surface. Note that, in most of the cases, wherever there is a shape or structural information in the object, the corresponding patterns in the image possess discontinuities in colors. Thus extraction of information regarding patterns of colors automatically leads to extracting shape and structural information of the object. There are several approaches to object representation such as histogram based, eigenspace based, edge and corner based, graph based representations etc. Among histogram based methods the work by Swain and Ballard [2] is one of the earliest works which used color as a primal cue for object recognition and image retrieval. Stricker [3] introduced indexing technique based on boundary histogram of multi- colored objects. Histogram based approach is an attractive method for object recognition because of its simplicity, speed and robustness [4]. Although it is simple the main drawbacks of this approach are its inability to encode shape and structural information of the objects, and the usage of only color information for distinguishing the objects. The standard procedure in eigenspace based methods is to represent an object by considering the whole image as a vector and projecting it over a set of eigenvectors to achieve data compression and reduction of redundant information. Generally, the eigen vectors corresponding to the dominant eigenvalues are found using Principal Component Analysis (PCA). Some of the earliest works on object recognition using eigenspace based representation are by Murase and Nayar in [5], [6] and Truk and Pentland [7]. These methods are effective when eigen space captures the characteristics of the whole database. For example, when all the object images have uniform known background. If there is a large variation in the images, performance of the methods can deteriorate. Such methods are best suited for recognition of an object that constitutes a complete image [8]. In graph based representation, generally, regions with their corre- sponding feature vector and the geometric relationship between these Authors are with the Machine Intelligence Unit of Indian Statis- tical Institute, 203 B. T. Road, Kolkata - 700108, India. E-mail sarif r, murthy @isical.ac.in regions are encoded in the form of a graph. Tu et al. [8] proposed a method which segments the image into regions of approximately constant color and the geometrical relationship of the segmented colored regions is represented by an attributed graph. Object match- ing, then, is formulated as an approximate graph-matching problem. The methods such as Color Adjacency Graph (CAG) [9], Attributed Relational Graph (ARG) [10], Shock graph [11] are prominent in this approach. Kostin et al. [12] proposed an object recognition scheme using graph matching. One advantage in graph based representation is that the geometric relationship can be used to encode certain shape information of the object and any sub-graph matching algorithm can be used to identify a single as well as multiple objects in query images. However, matching two such representations becomes a complicated process. Some of the issues in this regard are discussed in [12]. Support Vector Machine(SVM) based methods are used to classify both globally and locally obtained feature vectors of the objects [13], [14]. Roth et al. [15] proposed a view based algorithm for 3D object recognition using a network of linear units. Sparse Network of Win- nows(SNoW) learning architecture is used to learn the representations of objects. Two experiments are carried out by them using pixel- based representation and edge-based representation of the objects separately. In general above discussed methods use representation schemes, which are global in nature. The global representation schemes have certain short comings. These short comings can be overcome using local representations. In local representation schemes, generally, information from several regions of the images are encoded. Some of the local representation schemes are Local Affine Frames(LAF) [16], “Scale Invariant Feature Transform (SIFT)” [17], “Shape Context” [1], “Multi-modal Neighborhood Signature(MNS)”. However, SIFT and Shape Context are designed for gray scale images. Maree et al. [18] have proposed a generic approach to image classification based on decision tree ensembles and local sub-windows and improvements upon this method is reported in [19]. Two methods are proposed in this article for object recognition. Section II describes the motivation of the work. Section III contains a representation scheme to represent an object image using the descriptions of different regions of interest. Two different schemes are proposed to extract the regions of interest. Section IV has two dissimilarity measures, one is to compare two regions of interest and the other is to compare two object images through their descriptors. Section V contains a brief description of the datasets used, and comparisons with the existing methods. The article is concluded with a discussion on the proposed methodology in Section VI. II. MOTIVATION FOR PROPOSED METHOD Two important cues to distinguish between two objects are the overall shape and structure of the object, and the occurrence of different colors with respect to their spatial arrangements. Generally, human beings use both the cues for distinguishing objects in different stages. Although, it is difficult to imagine the actual shape of an object from the spatial arrangements of the different segments on its surface, it can be used as an important cue to represent the objects for classification. Several psychological studies regarding the representation of shape have been discussed in [20] and a survey of literature in this regard is provided in [21]. A way of preserving the positional information of adjacent segments is to store their representing color vectors as a unit. This connected set which covers pixels from all the adjacent segments and contains the color information from these segments is the region of interest. Let us call such a region as a “Multi-Colored Neighborhood (MCN)”. Six examples of such MCNs are shown in Fig. 1.