Thinning and the Law of Small Numbers Peter Harremo¨ es Centrum voor Wiskunde en Informatica P.O. 94079, 1090 GB Amsterdam The Nederlands P.Harremoes@cwi.nl Oliver Johnson Dept. Math., Univ. Bristol University Walk, Bristol, BS8 1TW United Kingdom O.Johnson@bristol.ac.uk Ioannis Kontoyiannis Athens Univ. of Econ. & Business Patission 76, Athens 10434 Greece yiannis@aueb.gr Abstract— The “thinning” operation on a discrete random variable is the natural discrete analog of scaling a continuous variable, i.e., multiplying it by a constant. We examine the role and properties of thinning in the context of information-theoretic inequalities for Poisson approximation. The classical Binomial-to- Poisson convergence, sometimes referred to as the “law of small numbers,” is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is also provided for this limit. A Nash equilibrium is established for a channel game, where Poisson noise and a Poisson input are optimal strategies. Our development partly parallels the development of Gaussian inequalities leading to the information- theoretic version of the central limit theorem. I. I NTRODUCTION Approximating the distribution of the sum of weakly de- pendent discrete random variables by a Poisson distribution is a well studied subject in probability; see [1] for an exten- sive account. Strong connections between these results and information-theoretic techniques were established in [2][3]; see also [4]. For the special case of approximating a Bino- mial distribution by a Poisson, the sharpest results to date are established via these techniques combined with Pinsker’s inequality [5][6][7], at least for most of the parameter values. Given α ∈ (0, 1) and a discrete random variable Y with distribution P on N 0 = {0, 1, 2,...}, the α-thinning of P is the distribution T α (P ) of the sum, Y  n=1 X n , where X 1 ,X 2 ...,X n ∼ i.i.d. Bernoulli(α), (1) and where Y is assumed to be independent of the {X i }. In this work we show that the thinning operation can be used to formulate a version of the law of small numbers, in a way that naturally resembles the classical formulation of the central limit theorem. In particular, the “thinning” law of large numbers we develop gives a Poisson limit theorem for sums of i.i.d. random variables, and not for triangular arrays. These results are shown to hold in total variation as well as in information divergence, and explicit rates of convergence are obtained. Thinning is also shown to be useful in the context of a discrete mutual information game, where the optimal strategies for both sender and jammer are given by the Poisson distribution. The central limit theorem has been established in the strong sense of information divergence in [8]; see also [9] and the references therein. The main results of this paper can be seen as analogous theorems for Poisson convergence. II. THINNING The thinning operation was introduced by R´ enyi in [10] in connection with the characterization theory of the Poisson process. Let α ∈ (0, 1) and P be a distribution on N 0 = {0, 1, 2,...}. The α-thinning of P is the distribution T α (P ) of the sum (1). An explicit representation of T α (P ) can be given as, T α (P )(k)= ∞  l=k P (l)  l k  α k (1 − α) l−k , k ≥ 0. It is immediate from the deﬁnition that the thinning of a sum of independent random variables is the convolution of the corresponding thinnings. Example 1: Thinning conserves the set of Bernoulli sums. That is, the thinned version of the distribution of a ﬁnite sum of Bernoulli random variables (with possibly different para- meters) is also such a sum. This follows from the last remark above together with the observation that the α-thinning of a Bernoulli(p) random variable is the Bernoulli(αp) distribution. Example 2: Thinning conserves the Poisson law, in that T α (Po(λ)) = Po(αλ): T α (Po (λ)) (k)= ∞  l=k Po (λ, l)  l k  α k (1 − α) l−k = ∞  l=k λ l l! e −λ  l k  α k (1 − α) l−k = e −λ k! α k λ k ∞  l=k λ l−k (l − k)! (1 − α) l−k = e −λ k! (αλ) k ∞  l=0 (λ (1 − α)) l l! = e −λ k! (αλ) k e λ(1−α) = Po (αλ, k) . Similarly, the α-thinning of a geometric distribution with mean λ is a geometric with mean αλ. And since the sum of n i.i.d. geometric distributions has negative Binomial distri- bution, the thinning of a negative Binomial is also negative Binomial.