International Journal of Computer Mathematics Vol. 81, No. 8, August 2004, pp. 931–941 OPTIMAL SYNCHRONOUS CODING DONGYANG LONG a,b,∗ , WEIJIA JIA c,† and MING LI d,‡ a Department of Computer Science, Zhongshan University, Guangzhou 510275, Guangdong, PRC; b The State Key Laboratory of Information Security, Chinese Academy of Sciences, Beijing 100039, PRC; c Department of Computer Engineering and Information Technology, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR, PRC; d School of Information Science and Technology, East China Normal University, Shanghai 200062, PRC (Received 25 June 2003) Novel synchronous coding schemes are introduced and relationships between optimal synchronous codes and Huffman codes are discussed. Although the problem of existence of the optimal synchronous codes has not yet resolved, we show that any synchronous code can be considered as an optimal synchronous code for some information source alphabet. In other words, synchronous codes are almost optimal and, therefore, are regarded as near optimal with respect to average code word length. It is shown that there always exist optimal synchronous codes for the information source alphabets with a dyadic probability distribution. Comparing with the Huffman coding, the synchronous coding is used not only for statistical modeling but also for dictionary methods. It is also good at using in a large information retrieval system like the Huffman coding. Moreover, from the viewpoint of computational difficulty, it is proven that breaking a synchronous or an optimal synchronous code is NP-complete. Keywords: Data compression; Huffman coding, Synchronous coding; Maximal prefix code; Optimal synchronous code C.R. Categories: F.4.3; E.4; H.1.1; I.4.2 1 INTRODUCTION Synchronous codes, variable length source codes, have been studied extensively [1, 2, 5–7, 12, 21, 25–27]. Sch¨ utzenberger has introduced synchronized prefix codes [25, 26], and the class of statistically synchronizable codes coincides with the class of synchronized prefix codes [5, 6]. In particular, Sch¨ utzenberger has studied the possible distributions of lengths in a synchronized prefix code giving necessary and sufficient conditions for a sequence of integers to be the distribution of lengths of a synchronized prefix code. Ferguson and Rabinowitz have discussed the subclass of statistically synchronizable Huffman codes [7]. Montgomery and Abrahams [21] have considered the problem of constructing a binary prefix code that This work was partially sponsored by City U Grants 7001355, 863 Program (Project No. 2002AA144060) and the National Natural Science Foundation of China (Project No. 60273062). ∗ Corresponding author. E-mail: issldy@zsu.edu.cn. or dylong25112002@yahoo.com † E-mail: itjia@cityu.edu.hk ‡ E-mail: mli@ee.ecnu.edu.cn ISSN 0020-7160 print; ISSN 1029-0265 online c 2004 Taylor & Francis Ltd DOI: 10.1080/00207160410001715285