Improvement on the Redundancy of the Knuth Balancing Scheme for Communication Systems Ebenezer Esenogho, Elie N. Mambou and Hendrick C. Ferreira Center for Telecommunication Dept. of Electrical and Electronic Engineering Science, University of Johannesburg P. O. Box 524, Auckland Park, 2006, South Africa Email: {ebenezere, emambou, hcferreira}@uj.ac.za Abstract—A simple scheme was proposed by Knuth to gen- erate balanced codewords from a random binary information sequence. However, this method presents a redundancy which is twice as that of the full sets of balanced codewords, that is the minimal achievable redundancy. The gap between the Knuth’s algorithm generated redundancy and the minimal one is signif- icantly considerable and can be reduced. This paper attempts to achieve this goal through a method based on information sequence candidates. The proposed scheme is suitable for various communication systems as it generates very efﬁcient and less redundant balanced codes. Index Terms—balanced code, redundancy, running digital sum (RDS), information sequence candidates, variable length (VL) preﬁx, ﬁxed length preﬁx. I. I NTRODUCTION A binary codeword of length k is said to be balanced if the number of zeros and ones within that sequence equals k/2, for even k. Balanced codes are very useful for digital recording of data on optical and magnetic storage disks. They can also be used to correct or detect errors within communication channels. Donald Knuth proposed a simple and efﬁcient scheme to generate balanced codewords [1]. This approach stipulates that any binary unbalanced codeword, x of length k can always be encoded into a balanced one denoted as x ′ , by inverting the ﬁrst e bits of x where 1 ≤ e ≤ k. The index e is encoded as a preﬁx, p that is appended to x ′ and send through a channel. At the receiver side, the decoder receives the codeword px ′ , read off ﬁrst the preﬁx and then, is able to recover the original information sequence x by inverting back the e ﬁrst bits of x ′ . This algorithm is very suitable for long sequences as it does make use of any lookup tables either at the encoder or the decoder. The redundancy of Knuth’s algorithm p, is approximately evaluated as p = log 2 k for m ≫ 1. (1) Since then, numerous work were published to reduce the redundancy presented in (1). In [2], two attempts to improve Knuth’s balancing al- gorithm were presented. The ﬁrst one was based on the distribution of the transmitted preﬁx index; the basic Knuth scheme uses the ﬁrst balanced point at position e to encode it as the preﬁx, therefore the encoder is set to choose smaller values for the position index. It has been shown that the distribution of the index for equiprobable information sequences, is not uniform and presents a redundancy of slightly less than (1). The second attempt used the multiplicity of inversion points within the information sequence, to transmit auxiliary data. These schemes both used a variable length preﬁx of the chosen index; this only made a minor improvement on the Knuth’s algorithm redundancy. The second attempt from [2] was revisited in [3]. This method was renamed bit recycling for Knuth’s algorithm (BRKA); it relies on a high probability of having more than one balance point from an information sequence; in order words, this scheme uses the multiplicity of encodings to reduce the gap between the lower bound redundancy and the Knuth’s one. A major contribution in reducing the Knuth’s algorithm redundancy was shown by Immink and Weber in [4]. This new scheme does not make use of look-up tables and presents a very efﬁcient encoding of the index preﬁx for both variable and ﬁxed length preﬁxes. Furthermore, the distribution of the preﬁx length was discussed as well as the average efﬁciency of this construction. In this paper, we propose a modiﬁcation of a scheme de- scribed in [4] to generate efﬁcient and less redundant balanced codes. This approach is designed for communication systems that model the data as packets and not accommodate cascading sequences as in most information theory related applications. The rest of this paper is organized as follows: the system model of the proposed scheme is described in Section II; then the method description based on information sequence candidates is presented in Section III. Section IV shows the decoding process. Section V and VI present detailed analysis as well as performance and discussions on the proposed scheme redundancy. Finally the paper is concluded in Section VII. II. SYSTEM MODEL Fig. 1 presents a system system of received data for two different systems. In (a), the received data is modelled as a set of balanced cascading sequences, each of them is composed of the encoded version of the information sequence appended with a preﬁx. This is suitable for applications such as data storage, insertion, deletion, etc; because they represent a long stream of data as a set of cascading balanced codewords referring to various data blocks to be encoded sequentially. arXiv:1711.03525v1 [cs.IT] 9 Nov 2017