Convergence Rates in Higher Order Markov Modeling of Block-Markov Sources András György Informatics Laboratory Computer and Automation Research Institute Hungarian Academy of Sciences Lágymányosi u. 11, 1111 Budapest, Hungary Email: gya@szit.bme.hu Daniel A. Nagy Dept. of Mathematics and Statistics Queen’s University Kingston, Ontario Canada, K7L 3N6 nagydani@mast.queensu.ca Tamás Linder Dept. of Mathematics and Statistics Queen’s University Kingston, Ontario Canada, K7L 3N6 linder@mast.queensu.ca I. I NTRODUCTION In practice, lossless compression is often applied to data that is composed of ﬁxed-length blocks of el- ementary symbols. For example, sequences of bytes (consisting of 8 bits) or sequences of 32-bit words (consisting of 4 bytes). As suggested by experimental data presented in [1], universal compression algorithms operating on elementary symbols (such as bits) achieve good compression results over a large family of sources, while those geared towards one particular block size are restricted to that and its multiples. In this paper, we investigate the relative entropy rate of a higher-order Markov model and a ﬁrst-order block- Markov source. In particular, we show that as the order of the Markov model increases, this relative entropy rate converges to zero exponentially fast. Since the redundancy of a variable-length lossless code matched to a model is essentially the relative entropy between the source model distribution and the actual distribution of the source, our result suggests that modeling data on the elementary symbol level not taking blocks into account is acceptably low in a Markovian setting. II. PRELIMINARIES A sequence of random variables {X n } ∞ n=0 taking values in a ﬁnite alphabet A is said to be a block-N - Markov source if the N -blocks X N−1 0 ,X 2N−1 N ,... form a Markov chain. The sequence {Y n } ∞ n=0 taking values in A is called an mth order Markov source if for any M ≥ m, the conditional distribution of Y M given Y M−1 0 is the same as the conditional distribution of Y M given Y M−1 M−m and these conditional distributions do not depend on M . We assume that the initial segments X N−1 0 and Y m−1 0 are both drawn from the stationary distributions of the respective processes (which we assume to exist), so This research was supported in part by the Natural Sciences and Engineering Research Council (NSERC), the NATO Science Fellow- ship, and by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. that Y ∞ 0 is a stationary process and X ∞ 0 is block-N stationary process. The divergence rate (relative entropy rate) between the two sources is deﬁned as usual: ¯ D(X ∞ 0 ‖ Y ∞ 0 ) = lim n→∞ 1 n D(P X n−1 0 ‖ P Y n−1 0 ) assuming the limit exists. In our case, both sources are stationary block-Markov processes with a block-size that is a multiple of both N and m, for which the limit always exists. To determine the rate loss (redundancy) of lossless coding which results from using the best mth order Markov model on symbols of A in approximating the actual block-N -Markov source (with symbols in A N ), one has to ﬁnd ¯ D m = min  ¯ D(X ∞ 0 ‖Y ∞ 0 ): Y ∞ 0 is mth order Markov  In [2], the best approximating mth order Markov- model Y has been found for a given block-N -Markov source X and its divergence rate has been calculated for sufﬁciently large m: Proposition 1 ([2]): Given a block-N Markov source X ∞ 0 , the relative entropy rate ¯ D(X ∞ 0 ‖ Y ∞ 0 ) is min- imized over all mth order Markov sources Y ∞ 0 if and only if P Y m 0 = P U m 0 , where the random vector U m 0 = U 0 ,...,U m is deﬁned by U j = X j−m+τ , j = 0,...,m where τ is a random variable that is uniformly distributed on {0, 1,...,N − 1} and is independent of {X n }. The resulting minimum relative entropy rate is given for all m ≥ 2N by ¯ D m = I (τ ; U m |U m−1 0 ), the conditional mutual information between τ and U m given U m−1 0 . A simple consequence of this result is that as m increases, the minimum relative entropy rate converges to zero, i.e., lim m→∞ ¯ D m =0.