IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 2, FEBRUARY 2008 867
Quantization for Nonparametric Regression
László Györfi, Fellow, IEEE, and Marten Wegkamp
Abstract—The authors discuss quantization or clustering of non-
parametric regression estimates. The main tools developed are oracle
inequalities for the rate of convergence of constrained least squares
estimates. These inequalities yield fast rates for both nonparametric (un-
constrained) least squares regression and clustering of partition regression
estimates and plug-in empirical quantizers. The bounds on the rate of
convergence generalize known results for bounded errors to subGaussian,
too.
Index Terms—Regression estimation with restriction, least squares esti-
mates, vector quantization, finite-sample bounds.
I. INTRODUCTION
The main aim of multivariate regression analysis is to predict the
response given a feature . In most cases, assuming
that where denotes the Euclidean norm, this is
achieved via (an estimate of) the regression function
that minimizes the mean squared error
over all measurable . We do not impose any restrictions
on the probability distribution of ; the coordinates of may have
various types of distributions, some of them may be discrete (for ex-
ample, binary), others may be absolutely continuous.
An added complication in this note is that the candidates each
have a finite codebook of size , a collection of distinct -vectors.
In this data compression setting we are seeking for with a code-
book of size , that minimizes the risk
over all with codebooks of size . Equivalently, in view
of the mean squared error decomposition above, we try to cluster
into groups without ever observing —we observe instead.
An example is the situation where is the set of meteorologic mea-
surements in a day and is the maximum temperatures in the next day
at the European capitals. If we are allowed to transmit forecasts
of only bits to the capitals, then the forecasts may have at most
values. These values can be interpreted as the possible maps of tem-
peratures. Linder, Lugosi, and Zeger [8] consider the related problem
of noisy sources, where only an outcome that is the source cor-
rupted by some additive noise , is observed.
The finite-sample bounds in Section II have consequences for the
rate of convergence of the regression estimate obtained via empirical
Manuscript received February 18, 2006; revised May 7, 2007. The work of
L. Györfi was supported by the Computer and Automation Research Institute of
the Hungarian Academy of Sciences. The work of M. Wegkamp was supported
by the NSF under Grants DMS 0406049 and DMS 0706829.
L. Györfi is with the Department of Computer Science and Information
Theory Budapest University of Technology and Economics, Budapes H-1117,
Hungary (e-mail: gyorfi@szit.bme.hu).
M. H. Wegkamp is with the Department of Statistics, Florida State University,
Tallahassee, FL 32306-4330 USA (e-mail: wegkamp@stat.fsu.edu).
Communicated by P. L. Bartlett, Associate Editor for Pattern Recognition,
Statistical Learning and Inference.
Digital Object Identifier 10.1109/TIT.2007.913565
error minimization where the response variable can be unbounded.
For example, an important special case of the regression problem is
when is the function with additive noise:
where is a zero mean Gaussian vector independent of .
Based on data , that consists of
independent copies of the pair , a natural estimate of is
given by the least squares estimate that minimizes the empirical
mean squared error
over in some subclass of measurable functions with
codebooks in of size , i.e.
(1)
Notice that is not the empirical version of .
The remainder of this article is organized as follows. Section II
presents the main results that concern the rate of convergence of
to the optimal risk
where is the class of measurable functions with codebooks in
of size . We make the blanket assumption that the minimizers and
of and , respectively, exist as the argument remains essentially
the same for near minimizers in the slightly more general situation. We
present two inequalities (Theorems 1 and 2) for the regret
under increasingly stronger, but still very general conditions, and
we show that the rates of convergence improve accordingly. Applied
to the (unrestricted) nonparametric least squares regression case, our
main result improves upon Györfi et al. ([6, Th. 11.5]) considerably.
Section III particularizes the developed theory to quantization of par-
titioning regression estimates. We utilize the fact that has the form
i.e., is the quantization of the regression function . We pro-
pose two types of estimators. The first one estimates first by
some and subsequently finds an optimal empirical quantizer
of this estimate. Though the selection error of this estimate is low, its
approximation error is difficult to grasp as it depends in a delicate way
on the estimate, the regression function and the boundaries of the quan-
tizer domains of . The second type of estimator discussed
in this paper is of the form (1) for a large class of functions with only
values in . It minimizes directly over this class, and as
such it does not quantize some estimate of .
Finally, all proofs are relegated to the Appendix.
II. RISK INEQUALITIES
In this section we show general finite-sample risk bounds such that
is an arbitrary class of measurable functions and .
Throughout the paper we make the following assumptions.
Condition (sG): The error is subGaussian
random variable, that is, there exist constants and
with
a.s. Furthermore, define and set .
0018-9448/$25.00 © 2008 IEEE