Metrika (1997) 46:41-57
The Multiresolution Histogram
JOACHIM ENGEL
PH Ludwigsburg, Reu'teallee 46, Postfach 220, 71602 Ludwigsburg, Germany
Abstract: We introduce a new method for locally adaptive histogram construction that doesn't
resort to a standard distribution and is easy to implement: the multiresolution histogram. It is
based on a L2 analysis of the mean integrated squared error with Haar wavelets and hence can be
associated with a multiresolution analysis of the sample space.
Key Words and Phrases: histogram; bin size selection; multiresolution analysis; wavelets.
1 Introduction
The histogram is the oldest and most widely used nonparametric density esti-
mator. It is intuitively very plausible and easy to compute. The histogram
requires a partition of the sample space into sets Bk, k = 1,...,m and is
defined as
1
f(x) - #{ilx, Bk} (1)
nl],( Bk )
for x ~ Bk. Here XI,.-.,X, denotes the data, assumed to be independent
observations of a random variable X with unknown density f and n is the
sample size. We consider only the case of one-dimensional observations, i.e.
Bk c IR, with Bk some real interval whose Lebesgue measure is 2(Bk). The
simplest case is an equal bin size histogram. Then the Bk are determined
through the choice of an origin x0 and a bin size or cell width h as Bk =
[x0 + (k - 1)h, x0 + kh). The above formula (1) then takes the form
f(x) = l #{ilXi ~ Ix0 + ( k - 1)h, x0 + kh)} . (2)
The shape of the histogram and its quality as estimator of the density f
depends decisively on the choice of the bin size h. Is h too large then all
0026 1335/97/46:1/41 57 $2.50 © 1997 Physica-Verlag, Heidelberg