Sequential Random Binning for Streaming Distributed Source Coding Stark C. Draper, Cheng Chang, and Anant Sahai Dept. of EECS, University of California, Berkeley, CA, 94720 {sdraper,cchang,sahai}@eecs.berkeley.edu Abstract— Random binning arguments underlie many results in information theory. In this paper we introduce and analyze a novel type of causal random binning – “sequential” binning. This binning is used to get stream- ing Slepian-Wolf codes with an “anytime” character. At the decoder, the probability of estimation error on any particular symbol goes to zero exponentially fast with delay. In the non-distributed context, we show equivalent results for fixed-rate streaming entropy coding. Because of space constraints, we present full derivations only for the latter, stating the results for the distributed problem. We give bounds on error exponents for both universal and maximum-likelihood decoders. I. I NTRODUCTION Consider the “lossless” entropy coding of a discrete memoryless source. One approach is to use a fixed- length block code, and accept some probability of encod- ing error. Errors occur when the realized source sequence is sufficiently atypical that it is not indexed by the code. The probability of such an event can be made as small as desired by using a sufficiently long block length. This block-length induces an end-to-end system delay. An alternate approach is to use a variable-length code. These codes achieve a zero-error probability by using longer codewords to encode more atypical sequences. These codes are characterized by variable delay – for a fixed communication rate, the more atypical the source sequence, the more bits to encode, and therefore the longer the delay before decoding. Both fixed and variable-length codes can be made universal over all stationary memoryless sources with an entropy lower than the target coding rate. For fixed- length codes, the encoder can simply “bin” the ob- served sequence. The decoder can then use a minimum empirical entropy rule to decode universally, without knowledge of source statistics. In the universal variable- length case, it is the encoder that traditionally does an explicit or implicit estimation of statistics so that it can assign longer codewords to less likely sequences. Now consider lossless entropy coding in the context of Slepian-Wolf codes [6]. In Slepian-Wolf coding, we cannot use variable-rate codes to get a zero probability of error, even with known statistics. This is easiest to see by example. Suppose x is a sequences of independent identi- cally distributed (i.i.d.) uniform binary random variables, related to y through a memoryless binary symmetric channel with crossover probability ρ< 0.5. The Slepian- Wolf sum-rate bound is H(x , y )=1+ H(ρ) < 2. But, since the individual encoders only see uniformly distributed binary sources, they do not know when the sources are behaving jointly atypically. Therefore, they have no basis on which to adjust their encoding rates. For this reason, variable-rate approaches do not yield zero-error Slepian-Wolf coding. Motivated by work in “anytime” channel coding [5], we ask whether we can design a streaming Slepian-Wolf system. We relax the demand for zero probability of error with a random delay (as in variable-length coding) and instead ask for an exponentially decreasing probability of error for all decoding delays. To build toward this goal, we introduce a sequential binning scheme in Section II. We use it to build a streaming fixed-rate universal entropy code. Using a sequential version of a minimum entropy decoding rule, the probability of decoding error decreases exponentially in the delay for all sources with entropies below the rate of the code. In Section III, we state our results for streaming Slepian-Wolf systems under both universal and maximum-likelihood (ML) decoding. Derivations will appear in [2]. Finally, in Section IV we discuss and illustrate some of the differences between streaming and block coding systems. II. STREAMING ENTROPY CODING VIA SEQUENTIAL RANDOM BINNING Source Model: A sequence of i.i.d. random symbols, x i , i =1, 2,... is observed at the encoder. The distribu- tion of each x i is denoted by p x , where p xi (x)= p x (x) for all i. At time l the encoder transmits a message m l which is a function of x l =[x 1 , x 2 ,..., x l ] to the decoder where m l ∈{1,..., exp[R x ]}. For convenience we measure rate in nats and for simplicity we assume