1494 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007
A Greedy Renormalization Method
for Arithmetic Coding
Yunwei Jia, Member, IEEE, En-hui Yang, Senior Member, IEEE, Da-ke He, Member, IEEE,
and Steven Chan, Member, IEEE
Abstract—A typical arithmetic coder consists of three steps:
range calculation, renormalization, and probability model updat-
ing. In this paper, we propose and analyze from an information
theoretic point of view a greedy renormalization method, which
has two components: greedy thresholding and greedy outputting.
The method significantly reduces the computational complexity
of the renormalization step of arithmetic coding by 1) using the
greedy thresholding to minimize the number of renormalizations
required to encode a sequence and 2) using the greedy outputting to
minimize the number of operations within each renormalization.
The method is particularly suitable for binary arithmetic coding
(BAC). Two BAC algorithms based on this method are presented.
The first algorithm replaces the renormalization method in the
TOIS BAC [2] with the greedy renormalization method, and keeps
other parts of the TOIS BAC unchanged. For binary independent
and identically distributed (i.i.d.) sources with the probability of
the less probable symbol ranging from 0.01–0.45, over 20% gain
in speed (on average), and less than 1% loss in compression rate
(in the worst case) are observed in the experiments. The second
algorithm combines the greedy renormalization method with the
QM-Coder. On an average, 30% gain in speed and 3% gain in
compression rate are observed in the experiments.
Index Terms—Arithmetic coding, computational complexity,
data compression, source coding.
I. INTRODUCTION
A
RITHMETIC coding has been widely used in data com-
pression [1], [2]. Compared to Huffman coding [3], arith-
metic coding has two distinct advantages. First, coupled with
a good probability model of a source, arithmetic coding can
encode a sequence from the source at a rate very close to the en-
tropy rate. In contrast, the compression rate given by Huffman
coding is normally strictly above the entropy rate since each
symbol in the data sequence must be assigned a codeword of
integer length. Thus, arithmetic coding, in general, can achieve
better compression than Huffman coding. Second, arithmetic
Paper approved by S. G. Wilson, the Editor for Data Compresson of the IEEE
Communications Society. Manuscript received February 20, 2004; revised May
8, 2005. This work was supported in part by the Natural Sciences and Engi-
neering Research Council of Canada under Grant RGPIN203035-98 and Grant
RGPIN203035-02, in part by the Premier’s Research Excellence Award, in part
by the Canada Foundation for Innovation, in part by the Ontario Distinguished
Research Award, and in part by the Canada Research Chairs Program. This
paper was presented in part at the 2003 Data Compression Conference, March
2003, Snowbird, UT, USA.
Y. Jia is with Advanced Micro Devices Inc., Markham, ON L3T 7X6, Canada
(e-mail: yunwei.jia@amd.com).
E. Yang is with the Department of Electrical and Computer Engineer-
ing, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail:
ehyang@uwaterloo.ca).
D. He is with IBM T. J. Watson Research Center, Yorktown Heights, NY
10598 USA (e-mail: dakehe@us.ibm.com).
S. Chan is with SlipStream Data, Inc., Waterloo, ON N2L 5Z5, Canada
(e-mail: schan@slipstream.com).
Digital Object Identifier 10.1109/TCOMM.2007.902534
coding is flexible in the sense that it can be easily used in con-
junction with sophisticated probability models. For example, in
the prediction by partial matching method [4], [5], arithmetic
coding is combined with adaptive Markov models of differ-
ent orders. On the other hand, combining Huffman coding with
complex probability models often proves to be a prohibitive task.
The main disadvantage of arithmetic coding is its relatively
high computational complexity. It is usually slower than Huff-
man coding and other fixed-length to variable-length coding
schemes [6]. In this paper, we focus on reducing the compu-
tational complexity of arithmetic coding, with emphasis on its
software implementations. Application of the techniques pre-
sented in this paper to hardware implementation is still under
investigation.
To illustrate the computational complexity of arithmetic cod-
ing in its software implementation and to give a brief review of
related research, we examine a popular software implementa-
tion of arithmetic coding given in [2]. In their seminal paper [1],
Witten et al. described an adaptive multisymbol arithmetic
coder, which was further modified in [2]. In this paper, we will
refer to this modified version as the TOIS coder. (TOIS stands
for ACM Transactions on Information Systems in which [2]
was published.) To encode a sequence x from a finite alphabet
A = {a
1
,...,a
M
} (M ≥ 2), the TOIS coder starts with an ini-
tial interval [0,N ), where N is a large positive integer. For each
symbol in x, the TOIS coder performs the following three steps.
Step 1) From the current interval, calculate the new interval
corresponding to the next input symbol by using its
estimated probability.
Step 2) From the newly calculated interval, output one or
more code bits and normalize the interval corre-
spondingly so that its length is greater than a prede-
fined threshold. This step is called renormalization
in the literature of arithmetic coding.
Step 3) Update the probability model corresponding to the
input symbol.
As shown by the experimental results in [1], the above three
steps take the same order of computation. Many techniques have
been proposed to reduce the computational complexity in Steps
1) and 3) (see [1], [2], [7]–[12] and the references therein). The
QM-Coder, developed by IBM [12] and adopted by Joint Bi-
level Image Experts Group (JBIG) [13], is a good example in
this regard. On the other hand, relatively little has been seen in
the literature to deal with the computational complexity in Step
2), the renormalization step. In this paper, we focus on reducing
the computational complexity in this step.
There are actually two tasks in the renormalization step: out-
putting code bits and expanding the current interval accordingly.
0090-6778/$25.00 © 2007 IEEE