Self-Synchronizing Reversible Variable Length Codes Hiroyoshi Morita Graduate School of Information Systems University of Electro-Communications Chofu Tokyo 182-8585, Japan Email: morita@is.uec.ac.jp Dongzhao SUN Graduate School of Information Systems University of Electro-Communications Chofu Tokyo 182-8585, Japan Email: daniel@math-sys.is.uec.ac.jp Abstract— A reversible variable-length code (RVLC) is a code in which the bit stream formed by a portion of a codeword, or by the overlapped portion of two or more adjacent codewords, is not a valid codeword. A self-synchronizing RVLC allows for a sequence of codewords to be decoded either backwards or forwards without any external synchronization. In this article, we present an algorithm for constructing a self-synchronizing RVLC in which a codeword functions as a sync marker. The main idea is to replace a minimum forbidden word (MFW) of the codeword stream with a codeword. Then the MFW plays as a sync marker. Moreover, a lower bound of probability that at least one such a resynchronization codeword appears in a constant interval is obtained. A sync string of RVLC is discussed as well. Keywords – reversible variable-length code (RVLC), resynchro- nization marker, minimum forbidden word, synchronizing se- quence. I. I NTRODUCTION One of practical problems on noiseless variable-length source codes is synchronization slippage in the presence of channel errors. Even if there is a bit error in the sequence of codewords, a catastrophic decoding error propagation may happen. A solution to resynchronize the code is to insert periodically a sync marker into a sequence of codewords to be transmitted. Here, a sync marker means a string that is neither a portion of a codeword nor the overlapped portion of two or more adjacent codewords. Prefix-free codes with a sync marker have been studied extensively [1], [2], [3], [4]. Unfortunately, such a marker does not always exist for any code. But if it does, we can detect the boundary of codewords by finding it in a transmitted sequence of bits. An alternative is to utilize a synchronizing string, or sync string, shortly, that appears only as a suffix of a codeword or of some adjacent codewords in a sequence of codewords. A sync string enables the decoder to continue exactly parsing the bit sequence into codewords after its occurrence, regardless of what bits preceded it. And it is known that almost all prefix- free codes have a sync string [5]. These two approaches for resynchronization described above, which have been mainly designed with prefix variable- length codes so far, have led to a growing level of interest in reversible variable-length codes as well. A reversible variable- length code (RVLC) is a code in which the bit sequence This work was partially presented at ISITA2006, Seoul, 2006. formed by a portion of a codeword, or by the overlapped portion of two or more adjacent codewords, is not a valid codeword. In other words, a codeword of an RVLC is neither prefix nor suffix of another codeword. Hence, an RVLC potentially allows for a sequence of codewords to be decoded either backwards or forwards if a sync marker or a sync string is detected in the sequence. The algorithms for constructing an RVLC from the Huffman code have been studied in [6], [7], [8]. Especially, Lakovic and Villasenor [8] considered a formal relationship between the number of available codewords of an RVLC of length k and the structure of other codewords of length less than k, and proposed an effective algorithm to obtain a suboptimal RVLC from the Huffman code. An RVLC is efficiently utilized to practical applications like MPEG video transmission for reducing the effect of error propagation during the transmission of the compressed video data [9]. When the decoder finds some decoding errors because of slippages, it can decode the bit sequence of codewords backwards from the point where the sync marker is found to the point where the error occurs. However, as far as the authors knows, there is no systematic approach for constructing a sync marker of an RVLC so far. In this paper, we will present a method for constructing a self-synchronizing RVLC in which some codeword functions as a sync marker. That is, a sync marker is utilized not only for resynchronization but also for conveying information of source symbols. We call these codewords sync codewords. The main idea is to replace a codeword of a given RVLC with one of minimal forbidden words (MFW) [11] of a sequence of code- words in which every combination of t consecutive codewords appears where usually t =3 in practical applications. To find sync codeword efficiently, a linear complexity algorithm for obtaining all MFWs of a given sequence [12] is utilized to the sequence of codewords above. Throughout this paper, we will be concerned with only an RVLC obtained by the Lakovic and Villasenor algorithm [8] while the proposed method is applicable to any RVLC as well. This paper is organized as follows. Section 2 gives an introduction of synchronous variable-length codes and the defi- nition of a sync codeword in an RVLC. Then, in Section 3, the definition and properties of MFWs are introduced. In Section 4, we propose a construction algorithm for synchronous RVLC and show our experiment results. And a probabilistic analysis