How Optimal Is Algebraic Binning Approach: A Case Study of the Turbo-Binning Scheme With Uniform and Nonuniform Sources Jing Li (Tiffany), Zhenyu Tu and Rick S. Blum Department of Electrical and Computer Engineering Lehigh University, Bethlehem, PA 18105 {jingli, zht3, rblum}@ece.lehigh.edu Abstract— This paper investigates the optimality of the binning approach in distributed source coding for both uniform and nonuniform sources. While the algebraic binning scheme is optimal for uniform sources both asymp- totically and at finite lengths, it is shown that the optimality holds only asymptotically for nonuniform sources. High- performance turbo codes are used with the binning scheme on several source distributions to quantify how close they can get to the theoretical limit with relatively large block sizes. For nonuniform sources, optimal code design and variable-length bin-indexes are exploited as a useful exten- sion to the conventional binning scheme. It is shown that the two strategies combined can improve the compression rate by as much as 0.22 bit/symbol for highly biased sources. I. I NTRODUCTION The syndrome/coset/binning scheme used in the proof of the Slepian-Wolf boundary in distributed source cod- ing (DSC) [1] provides a generic approach for asym- metric compression where one source is assumed loss- lessly available at the decoder (e.g. via conventional entropy-achieving compression method) and the other is compressed as much as possible. This paper studies the optimality of the binning approach with binary memo- ryless sources that are either uniformly or nonuniformly distributed. That the binning scheme is optimal for uni- form sources both asymptotically and at finite lengths is well-established [1][2]. The case of nonuniform sources, however, is much less studied. It should be noted that nonuniform sources are not uncommon in real life. For example, many binary images (e.g.. facsimile images) may contain as much as 76% of redundancy which corresponds to a source distribution of p 0 =0.96 and p 1 =0.04 [3]. For most communication and signal processing problems, it can be assumed that a front-end compression will be performed to get rid of the source redundancy before the intended signal processing and/or This material is based on research supported by the Air Force Research Laboratory under agreement No. F49620-03-1-0214, by the National Science Foundation under Grant No. CCR-0112501, and by a grant from the Commonwealth of Pennsylvania, Department of Community and Economic Development, through the Pennsylvania Infrastructure Technology Alliance (PITA). transmission. For distribued source coding, however, such a pre-process will either ruin the cource correlation or make the correlation analytically intractable and, hence, is not possible. We first show that, while the generic binning concept does not make any assumption on the underlying source distribution and is in principle optimal regardless the uniformity of the sources, in practice, the algebraic binning scheme using linear codes is optimal for nonuni- form sources only asymptotically. Specifically, we show that the nonuniformity in the source distribution and the geometry uniformity of a linear code (which is required by the binning construction) present two factors that oppose each other, causing a loss in compression rate unless the length of source sequences goes to infinity. Next, we show that, by exploiting optimal code selection and variable-length bin-indexes, the suboptimality of the binning approach (for nonuniform sources) can be mitigated. To give a quantitative feel of how much can be achieved, we explore high-performance turbo codes with the algebraic binning scheme [4][5] for several source distributions. For uniform sources, as shown in [4][5], the turbo-binning scheme can perform as close as 0.07 bit/symbol from the theoretic limit with fairly large block sizes. For (highly) nonuniform sources, we show that not using the proposed strategies (i.e. optimal channel code and variable-length bin-indexes) sees a huge gap (e.g. 0.36 bit/symbol) between the achievable compression rate and the theoretical limit. Using these remedies can close the gap by as much as 0.22 bit/symbol, but the performance is nevertheless 0.14 bit/symbol away from the limit. The rest of the paper is organized as follows. Section II introduces the system model and the Slepian-Wolf boundary. Section III discusses the theoretical binning concept and the practical binning scheme, and analyzes their optimality with uniform and nonuniform sources. Section IV discusses the turbo-binning scheme to quan- tify the gap between the achievable performance and the theoretical results. Finally Section V concludes the paper.