IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 2, FEBRUARY 2008 867 Quantization for Nonparametric Regression László Györﬁ, Fellow, IEEE, and Marten Wegkamp Abstract—The authors discuss quantization or clustering of non- parametric regression estimates. The main tools developed are oracle inequalities for the rate of convergence of constrained least squares estimates. These inequalities yield fast rates for both nonparametric (un- constrained) least squares regression and clustering of partition regression estimates and plug-in empirical quantizers. The bounds on the rate of convergence generalize known results for bounded errors to subGaussian, too. Index Terms—Regression estimation with restriction, least squares esti- mates, vector quantization, ﬁnite-sample bounds. I. INTRODUCTION The main aim of multivariate regression analysis is to predict the response given a feature . In most cases, assuming that where denotes the Euclidean norm, this is achieved via (an estimate of) the regression function that minimizes the mean squared error over all measurable . We do not impose any restrictions on the probability distribution of ; the coordinates of may have various types of distributions, some of them may be discrete (for ex- ample, binary), others may be absolutely continuous. An added complication in this note is that the candidates each have a ﬁnite codebook of size , a collection of distinct -vectors. In this data compression setting we are seeking for with a code- book of size , that minimizes the risk over all with codebooks of size . Equivalently, in view of the mean squared error decomposition above, we try to cluster into groups without ever observing —we observe instead. An example is the situation where is the set of meteorologic mea- surements in a day and is the maximum temperatures in the next day at the European capitals. If we are allowed to transmit forecasts of only bits to the capitals, then the forecasts may have at most values. These values can be interpreted as the possible maps of tem- peratures. Linder, Lugosi, and Zeger [8] consider the related problem of noisy sources, where only an outcome that is the source cor- rupted by some additive noise , is observed. The ﬁnite-sample bounds in Section II have consequences for the rate of convergence of the regression estimate obtained via empirical Manuscript received February 18, 2006; revised May 7, 2007. The work of L. Györﬁ was supported by the Computer and Automation Research Institute of the Hungarian Academy of Sciences. The work of M. Wegkamp was supported by the NSF under Grants DMS 0406049 and DMS 0706829. L. Györﬁ is with the Department of Computer Science and Information Theory Budapest University of Technology and Economics, Budapes H-1117, Hungary (e-mail: gyorﬁ@szit.bme.hu). M. H. Wegkamp is with the Department of Statistics, Florida State University, Tallahassee, FL 32306-4330 USA (e-mail: wegkamp@stat.fsu.edu). Communicated by P. L. Bartlett, Associate Editor for Pattern Recognition, Statistical Learning and Inference. Digital Object Identiﬁer 10.1109/TIT.2007.913565 error minimization where the response variable can be unbounded. For example, an important special case of the regression problem is when is the function with additive noise: where is a zero mean Gaussian vector independent of . Based on data , that consists of independent copies of the pair , a natural estimate of is given by the least squares estimate that minimizes the empirical mean squared error over in some subclass of measurable functions with codebooks in of size , i.e. (1) Notice that is not the empirical version of . The remainder of this article is organized as follows. Section II presents the main results that concern the rate of convergence of to the optimal risk where is the class of measurable functions with codebooks in of size . We make the blanket assumption that the minimizers and of and , respectively, exist as the argument remains essentially the same for near minimizers in the slightly more general situation. We present two inequalities (Theorems 1 and 2) for the regret under increasingly stronger, but still very general conditions, and we show that the rates of convergence improve accordingly. Applied to the (unrestricted) nonparametric least squares regression case, our main result improves upon Györﬁ et al. ([6, Th. 11.5]) considerably. Section III particularizes the developed theory to quantization of par- titioning regression estimates. We utilize the fact that has the form i.e., is the quantization of the regression function . We pro- pose two types of estimators. The ﬁrst one estimates ﬁrst by some and subsequently ﬁnds an optimal empirical quantizer of this estimate. Though the selection error of this estimate is low, its approximation error is difﬁcult to grasp as it depends in a delicate way on the estimate, the regression function and the boundaries of the quan- tizer domains of . The second type of estimator discussed in this paper is of the form (1) for a large class of functions with only values in . It minimizes directly over this class, and as such it does not quantize some estimate of . Finally, all proofs are relegated to the Appendix. II. RISK INEQUALITIES In this section we show general ﬁnite-sample risk bounds such that is an arbitrary class of measurable functions and . Throughout the paper we make the following assumptions. Condition (sG): The error is subGaussian random variable, that is, there exist constants and with a.s. Furthermore, deﬁne and set . 0018-9448/$25.00 © 2008 IEEE