Necessary and Sufﬁcient Conditions for Zero-Rate Density Estimation Jorge F. Silva Department of Electrical Engineering Universidad de Chile josilva@ing.uchile.cl Milan S. Derpich Electronic Engineering Department Universidad T´ ecnica Federico Santa Mar´ ıa milan.derpich@usm.cl Abstract— This work addresses the problem of universal den- sity estimation under an operational data-rate constraint. We present a coding theorem that stipulates necessary and sufﬁcient conditions to learn and transmit a memoryless source distribution with arbitrary precision (in total variations), under an asymptotic zero-rate regime, in bits per sample. In the process, we propose a concrete coding scheme to achieve this learning objective, adopting the Skeleton estimate developed by Y. Yatracos [1], [2]. I. I NTRODUCTION This work studies the problem of universal density estima- tion under an operational data-rate constraint. The basic setting consists of an agent (the sensor) observing i.i.d. samples from an unknown distribution µ with the objective of jointly learn- ing and transmitting a ﬁnite description of µ to a second agent (the receiver), which decodes that information to construct an estimate ˆ µ. This density estimation and coding problem has taken the attention of the community because of its role in sensor networks, and because of its strong connection with universal lossy-source coding (ULSC) [3], [4]. Making echo of the seminal work of Rissanen [3], it is well understood that the problem of universal lossless-source cod- ing is connected with the problem of distribution estimation, as there exists a one-to-one correspondence between preﬁx- free codes and ﬁnite-entropy discrete distributions (models), in the ﬁnite-alphabet case [4]. This interplay, however, is less obvious when we move to the lossy-source coding scenario. Addressing this issue, Raginsky [5] has recently stipulated results that connect the problem of ﬁxed-rate universal lossy source coding, with the problem of transmitting the source distribution with arbitrary precision, from one point to another, under an asymptotically zero-rate operational constraint [5]. This connection was made under the two-stage joint modeling and coding framework [5]. Taking ideas from statistical learn- ing, the data was split in training and testing samples, where ﬁrst, the training data is used to construct a ﬁnite description of the source distribution (ﬁrst stage), and the second stage uses the ﬁrst bits to pick a matched (with respect to the estimated distribution) lossy source code to encode the test data. Remarkably, in this joint modeling-coding framework, the existence of a zero-rate consistent estimate of the distri- bution (in total variations), is sufﬁcient to show the existence of a universal ﬁxed-rate source coding scheme, achieving the Shannon distortion-rate function [4], for any given rate, and for any distribution within a bounded parametric family with some needed regularity conditions [5, Th. 3.2]. This raises the question of whether there are broader families of measures (non-parametric) for which this result is also valid. In this work we study in deeper details the problem of universal density estimation under an asymptotically zero- rate constraint. Our main result is a coding theorem that stipulates necessary and sufﬁcient conditions to guarantee that zero-rate is achievable for this learning-coding problem. Interestingly, there is a tight connection with the rich non- parametric collection of L 1 -totally bounded densities [6]. Fur- thermore, we propose a concrete coding scheme, the Skeleton estimate developed by Yatracos [1], [2], [6], to achieve our coding objective, which is a concrete demonstration of its information theoretic attributes, something that was mentioned by Devroye and Lugosi [6, Ch. 7.1] and which, to the best of our knowledge, has not been presented before. In the parametric scenario considered in [5], the Skeleton scheme offers an optimal learning rate of O(  1/n) under the zero- rate regime, where, furthermore, this rate is extended for general non-parametric families. II. DENSITY ESTIMATION UNDER A BIT RATE CONSTRAINT Let X ∈B(R d ) be a separable and complete subset of R d (i.e., X is a Polish subspace of R d ). Let P (X) be the collection of probability measures in (X, B(X)) and let AC (X) ⊂P (X) denote the set of probability measures absolutely continuous with respect to the Lebesgue measure λ [7] 1 . For any µ ∈ AC (X), ∂µ ∂λ (x) denotes the Radon-Nikodym (RN) derivative of µ with respect to λ. For the estimation problem the ﬁdelity criterion adopted is the total variational distance. Let v and µ be two probability measures in P (X). The total variation of v and µ is given by V (µ, v)= sup A∈B(X) |µ(A) − v(A)| , (1) 1 A measure σ is absolutely continuous with respect to a measure μ, denoted by σ ≪ μ, if for any event A such that μ(A)=0, then σ(A)=0. Consequently ∂σ ∂μ is well deﬁned, which is the Radon-Nicodym derivative or density, and furthermore, ∀A ∈B(X), σ(A)=  A ∂σ ∂μ ∂μ.