1494 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 55, NO. 8, AUGUST 2007 A Greedy Renormalization Method for Arithmetic Coding Yunwei Jia, Member, IEEE, En-hui Yang, Senior Member, IEEE, Da-ke He, Member, IEEE, and Steven Chan, Member, IEEE Abstract—A typical arithmetic coder consists of three steps: range calculation, renormalization, and probability model updat- ing. In this paper, we propose and analyze from an information theoretic point of view a greedy renormalization method, which has two components: greedy thresholding and greedy outputting. The method significantly reduces the computational complexity of the renormalization step of arithmetic coding by 1) using the greedy thresholding to minimize the number of renormalizations required to encode a sequence and 2) using the greedy outputting to minimize the number of operations within each renormalization. The method is particularly suitable for binary arithmetic coding (BAC). Two BAC algorithms based on this method are presented. The first algorithm replaces the renormalization method in the TOIS BAC [2] with the greedy renormalization method, and keeps other parts of the TOIS BAC unchanged. For binary independent and identically distributed (i.i.d.) sources with the probability of the less probable symbol ranging from 0.010.45, over 20% gain in speed (on average), and less than 1% loss in compression rate (in the worst case) are observed in the experiments. The second algorithm combines the greedy renormalization method with the QM-Coder. On an average, 30% gain in speed and 3% gain in compression rate are observed in the experiments. Index Terms—Arithmetic coding, computational complexity, data compression, source coding. I. INTRODUCTION A RITHMETIC coding has been widely used in data com- pression [1], [2]. Compared to Huffman coding [3], arith- metic coding has two distinct advantages. First, coupled with a good probability model of a source, arithmetic coding can encode a sequence from the source at a rate very close to the en- tropy rate. In contrast, the compression rate given by Huffman coding is normally strictly above the entropy rate since each symbol in the data sequence must be assigned a codeword of integer length. Thus, arithmetic coding, in general, can achieve better compression than Huffman coding. Second, arithmetic Paper approved by S. G. Wilson, the Editor for Data Compresson of the IEEE Communications Society. Manuscript received February 20, 2004; revised May 8, 2005. This work was supported in part by the Natural Sciences and Engi- neering Research Council of Canada under Grant RGPIN203035-98 and Grant RGPIN203035-02, in part by the Premier’s Research Excellence Award, in part by the Canada Foundation for Innovation, in part by the Ontario Distinguished Research Award, and in part by the Canada Research Chairs Program. This paper was presented in part at the 2003 Data Compression Conference, March 2003, Snowbird, UT, USA. Y. Jia is with Advanced Micro Devices Inc., Markham, ON L3T 7X6, Canada (e-mail: yunwei.jia@amd.com). E. Yang is with the Department of Electrical and Computer Engineer- ing, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail: ehyang@uwaterloo.ca). D. He is with IBM T. J. Watson Research Center, Yorktown Heights, NY 10598 USA (e-mail: dakehe@us.ibm.com). S. Chan is with SlipStream Data, Inc., Waterloo, ON N2L 5Z5, Canada (e-mail: schan@slipstream.com). Digital Object Identifier 10.1109/TCOMM.2007.902534 coding is flexible in the sense that it can be easily used in con- junction with sophisticated probability models. For example, in the prediction by partial matching method [4], [5], arithmetic coding is combined with adaptive Markov models of differ- ent orders. On the other hand, combining Huffman coding with complex probability models often proves to be a prohibitive task. The main disadvantage of arithmetic coding is its relatively high computational complexity. It is usually slower than Huff- man coding and other fixed-length to variable-length coding schemes [6]. In this paper, we focus on reducing the compu- tational complexity of arithmetic coding, with emphasis on its software implementations. Application of the techniques pre- sented in this paper to hardware implementation is still under investigation. To illustrate the computational complexity of arithmetic cod- ing in its software implementation and to give a brief review of related research, we examine a popular software implementa- tion of arithmetic coding given in [2]. In their seminal paper [1], Witten et al. described an adaptive multisymbol arithmetic coder, which was further modified in [2]. In this paper, we will refer to this modified version as the TOIS coder. (TOIS stands for ACM Transactions on Information Systems in which [2] was published.) To encode a sequence x from a finite alphabet A = {a 1 ,...,a M } (M 2), the TOIS coder starts with an ini- tial interval [0,N ), where N is a large positive integer. For each symbol in x, the TOIS coder performs the following three steps. Step 1) From the current interval, calculate the new interval corresponding to the next input symbol by using its estimated probability. Step 2) From the newly calculated interval, output one or more code bits and normalize the interval corre- spondingly so that its length is greater than a prede- fined threshold. This step is called renormalization in the literature of arithmetic coding. Step 3) Update the probability model corresponding to the input symbol. As shown by the experimental results in [1], the above three steps take the same order of computation. Many techniques have been proposed to reduce the computational complexity in Steps 1) and 3) (see [1], [2], [7]–[12] and the references therein). The QM-Coder, developed by IBM [12] and adopted by Joint Bi- level Image Experts Group (JBIG) [13], is a good example in this regard. On the other hand, relatively little has been seen in the literature to deal with the computational complexity in Step 2), the renormalization step. In this paper, we focus on reducing the computational complexity in this step. There are actually two tasks in the renormalization step: out- putting code bits and expanding the current interval accordingly. 0090-6778/$25.00 © 2007 IEEE