D.-Y. Yeung et al. (Eds.): SSPR&SPR 2006, LNCS 4109, pp. 412 – 421, 2006.
© Springer-Verlag Berlin Heidelberg 2006
An Efficient Distance Between Multi-dimensional
Histograms for Comparing Images
Francesc Serratosa and Gerard Sanromà
Universitat Rovira i Virgili, Dept. d’Enginyeria Informàtica i Matemàtiques, Spain
francesc.serratosa@.urv.cat, gerard.sanroma@urv.cat
Abstract. The aim of this paper is to present an efficient distance between n-
dimensional histograms. Some image classification or image retrieval
techniques use the distance between histograms as a first step of the
classification process. For this reason, some algorithms that find the distance
between histograms have been proposed in the literature. Nevertheless, most of
this research has been applied on one-dimensional histograms due to the
computation of a distance between multi-dimensional histograms is very
expensive. In this paper, we present an efficient method to compare multi-
dimensional histograms in O(2z), where z represents the number of bins.
Results show a huge reduction of the time consuming with no recognition-ratio
reduction.
1 Introduction
Finding the distance or similarity between histograms is an important issue in image
classification or image retrieval since a histogram represents the frequency of the
values of the pixels among the images. For this reason, a number of measures of
similarity between histograms have been proposed and used in computer vision and
pattern recognition. Moreover, if the position of the pixels is unimportant while
considering the distance measure, we can compute the distance between images in an
efficient way by computing the distance between their histograms.
Most of the distance measures presented in the literature (there is an interesting
compilation in [1]) consider the overlap or intersection between two histograms as a
function of the distance value but they do not take into account the similarity on the
non-overlapping parts of the two histograms. For this reason, Rubner presented in [2]
a new definition of the distance measure between n-dimensional histograms that
overcomes this non-overlapping parts problem. It was called Earth Mover’s Distance
and it is defined as the minimum amount of work that must be performed to transform
one histogram into the other one by moving distribution mass.
Often, for specific set measurements, only a small fraction of the bins in a
histogram contain significant information, that is, most of the bins are empty. This is
more frequent when the dimensions of the histograms increase. In that cases, the
methods that use histograms as fixed-sized structures obtain poor efficiency. In the
algorithm depicted by Rubner [2] to find the Earth Mover’s Distance the empty-bins
where not explicitly considered. They used the simplex algorithm [3] to compute
the distance measure and the method presented in [4] to search a good initialisation.
The computational cost of the simplex iteration is O(z’
2
), where z’ is the number of