Evaluation of Various Aggregation Operators Applied to a Content Based Image Retrieval System Mircea Ionescu Department of ECECS, University of Cincinnati, Cincinnati, OH 45221-0030, USA ionescmm@ececs.uc.edu Anca Ralescu Department of ECECS, University of Cincinnati, Cincinnati, OH 45221-0030, USA aralescu@ececs.uc.edu Abstract— Fuzzy Hamming Distance is successfully used in a Content-Based Image Retrieval (CBIR) system as a similarity measure. The system performs a m × n partitioning of the compared images and for each partitions pairs evaluates FHD. In the last step the FHD are defuzzified and the results are combined in a final score. In order to take full advantage of the use of fuzzy sets, the current study investigates the possibility of reversing the order of the defuzzification and aggregation steps: aggregate fuzzy set and defuzzify final result for ranking. Several t-norm and associated t-conorm aggregation operators are experimented with. The results are illustrated on retrieval operations from an image database. Keywords: fuzzy sets aggregation, image retrieval systems, CBIR, fuzzy hamming distance. I. I NTRODUCTION The goal of content based image retrieval systems(CBIR) is to allow querying images databases in a natural way, by the image content. To achieve this goal, various features such as sketches, layout or structural description, texture, colors, are extracted from each image. A query might be: Find all images with a pattern similar to this one (the query pattern), and then the system finds a subset of images similar to the query image. QBIC, Query By Image Content, from IBM [1], is one of the earliest such a system, using a weighted Euclidean distance on colors, texture, shape and sketch to assess similarity between two images. In previous studies, [2], [3], [4], the Fuzzy Hamming Distance (FHD) has been used to assess content similarity between two images based on color information. In such an approach aggregation is expected to play an important role towards obtaining a final score of similarity. II. THE FUZZY HAMMING DISTANCE The Fuzzy Hamming Distance [5] is a generalization of Hamming distance over the set of real-valued vectors defined as the (fuzzy) number of different components of the input vectors, captured by the difference fuzzy set. The fuzzy set shows the degree to which the input vectors are different by 0, 1,...,n, where n is the size of the vectors. In short, the Fuzzy Hamming Distance is the fuzzy cardinality of the difference fuzzy set. A crisp version of it, nFHD, can be also defined, which, when the input vectors are binary, reduces to the classical Hamming distance. Moreover, FHD can be parameterized to control the extent, context dependent, of the difference, in order for this to be considered meaningful. Previous studies [5], [2], show that FHD can be viewed as an adaptive decomposition of the Euclidean (and other distances, such as Minkowski distance). III. OVERVIEW OF THE CONTENT BASED I MAGE RETRIEVAL SYSTEM The CBIR system proposed in [2], [4] and further studied in [3], consists of the three modules as shown in Fig. 1: 1) The Preprocessing Module splits each image (query image and every image in database) into partitions of granularity m × n, to include position information; from each partition it extracts the information of interest (in this study the color histograms). The output is a collection of color histograms, one for each partition, stored as real-valued vectors. 2) The Similarity Assessment Module takes as input the information from the preprocessing module and com- putes the similarity (actually the FHD), between the query image and each image in the database. The output of this module is a collection of fuzzy sets (FHD). 3) The Ranking Module defuzzifies the input (FHD of each partition), aggregates the results into a score (e.g. using a weighted sum), ranks the scores in decreasing order. The current study investigates the possibility of reversing the order of the defuzzification and ranking steps. The motivation behind this is as follows: since defuzzification of a fuzzy set consists of extracting a (representative) point from it, its result is necessarily an approximation (of the original set). Additionally, in the context of the current CBIR system, where defuzzification is followed by aggregation of scores, the number of defuzzification operations is equal to m × n (partition granularity) and therefore such approximations may, ultimately, impact the outcome of the system. Reversing the order of aggregation and defuzzification steps would result then in one aggregation step - this time of fuzzy sets, rather than crisp numeric scores - and one defuzzification operation - to obtain the final score. Only t-norms and t-conorms operators are used to aggregate the fuzzy sets; other aggregation operators are omitted for rea-