D.-Y. Yeung et al. (Eds.): SSPR&SPR 2006, LNCS 4109, pp. 412 421, 2006. © Springer-Verlag Berlin Heidelberg 2006 An Efficient Distance Between Multi-dimensional Histograms for Comparing Images Francesc Serratosa and Gerard Sanromà Universitat Rovira i Virgili, Dept. d’Enginyeria Informàtica i Matemàtiques, Spain francesc.serratosa@.urv.cat, gerard.sanroma@urv.cat Abstract. The aim of this paper is to present an efficient distance between n- dimensional histograms. Some image classification or image retrieval techniques use the distance between histograms as a first step of the classification process. For this reason, some algorithms that find the distance between histograms have been proposed in the literature. Nevertheless, most of this research has been applied on one-dimensional histograms due to the computation of a distance between multi-dimensional histograms is very expensive. In this paper, we present an efficient method to compare multi- dimensional histograms in O(2z), where z represents the number of bins. Results show a huge reduction of the time consuming with no recognition-ratio reduction. 1 Introduction Finding the distance or similarity between histograms is an important issue in image classification or image retrieval since a histogram represents the frequency of the values of the pixels among the images. For this reason, a number of measures of similarity between histograms have been proposed and used in computer vision and pattern recognition. Moreover, if the position of the pixels is unimportant while considering the distance measure, we can compute the distance between images in an efficient way by computing the distance between their histograms. Most of the distance measures presented in the literature (there is an interesting compilation in [1]) consider the overlap or intersection between two histograms as a function of the distance value but they do not take into account the similarity on the non-overlapping parts of the two histograms. For this reason, Rubner presented in [2] a new definition of the distance measure between n-dimensional histograms that overcomes this non-overlapping parts problem. It was called Earth Mover’s Distance and it is defined as the minimum amount of work that must be performed to transform one histogram into the other one by moving distribution mass. Often, for specific set measurements, only a small fraction of the bins in a histogram contain significant information, that is, most of the bins are empty. This is more frequent when the dimensions of the histograms increase. In that cases, the methods that use histograms as fixed-sized structures obtain poor efficiency. In the algorithm depicted by Rubner [2] to find the Earth Mover’s Distance the empty-bins where not explicitly considered. They used the simplex algorithm [3] to compute the distance measure and the method presented in [4] to search a good initialisation. The computational cost of the simplex iteration is O(z’ 2 ), where z’ is the number of