IEEE TRANSACTIONS ON COMPUTERS, VOL. C-29, NO. 4, APRIL 1980 Correspondence- On Uniquely Decipherable Codes with Two Codewords RONALD V. BOOK AND SAI CHOI KWAN Abstract-It is shown that every uniquely decipherable code with just two codewords has finite delay. In addition, if a uniquely decipherable code with just two codewords is full, then it is trivial. Index Terms-Finite delay, semigroups, uniquely decipherable codes. Suppose that S is a finite set of strings. The question of whether an arbitrary string can be expressed as the concatenation of strings in S is easy to answer: construct a deterministic finite-state acceptor M to recognize all and only those strifigs in S*, the set of strings ob- tained as finite concatenations of strings from S, and run M on x. A second question is whether every string in S* has a unique factori- zation or parse as the concatenation of strings in S; for some choices of S this is true and others false-this question is decidable, e.g., see [2]. Suppose that S is such that every string in S* does have a unique parse as the concatenation of strings in S. How difficult is it to obtain this parse? Some choices of S require that any parsing algorithm have memory bounded by a function of the length of the input string while other choices of S are such that a parsing algorithm need have only finite memory. Here we consider the situation where S contains just two strings, say S = Ix,yI. We show that if xy $ yx, then a single-scan parsing algorithm with finite memory can be used to parse strings in S*. The methods used are those of the theory of variable-length codes and of semigroups. The results are new but can be obtained as corollaries of more sophisticated theorems. In this note simple and direct proofs are provided. In [2],[3] it is shown that variable-length codes are related to in- formation-lossless automata. Connections between the study of variable-length codes and the study of subsemigroups of free semi- groups exist [8],[9], in particular, if z is a finite set of symbols, then the set 2* of all strings over z is the free semigroup (with identity e) generated by I, and a set S c Z* is a uniquely decipherable (UD) code if the subsemigroup S* of 2;* generated by S is a free subsem- igroup and has S as its minimal generating set. If S is a uniquely decipherable code, then x E S* is a message and s e S is a codeword. A code S c 2,* is trivial if S c I, i.e., each codeword is a string of length one. Consider subsemigroups of 2* generated by sets with just two el- ements. For any x, y, E 2*, the following are equivalent [1],[5], [7]: 1) Ix,yj* is a free subsemigroup of Z*; 2) xy # yx; 3) there do not exist z e Z* and p, q > 0 such that x = zP and y =Zq. This fact leads us to consider codes with just two codewords. A UD code S E 2* hasfinite delay if there exists an integer t such Manuscript received October 26, 1978; revised May 28, 1979. This research was supported in part by the National Science Foundation under Grant MCS77-1 1360. The authors are with the Department of Mathematics, University of Cal- ifornia at Santa Barbara, Santa Barbara, CA 93106. that for any message w e S*, examining the prefix of w of length at most t allows one to determine the first codeword occurring in w's unique factorization as a message in S*. A UD code S is said to have delay k if k is the smallest integer that has this property. If S is a UD code, then S* is a regular set so that the question of whether a given string is a concatenation of codewords from S can be answered by scanning the string once from left to right with a fi- nite-state acceptor that recognizes the strings in S*. If one wishes to decode a message in S*, that is, to parse a string in S* in terms of the codewords in S, then it may not be possible to do this using only finite memory and a single scan of the string. However, if S has finite delay, then one can construct a finite-state machine that will accept those strings that are in S* and give as output the unique decoding of those strings. It is easy to see that the UD code I1,1 0,lOj with three codewords does not have finite delay: 1 1010101010... can be interpreted as 1- 1-010-1-010-1---- or 10-1-010-1-010---- [2]. Our first result is that any UD code with two codewords has finite delay. Theorem 1: If S is a UD code with two codewords, then S has fi- nite delay. Proof: Let S = Ix,yj c 2* be a UD code. Clearly if neither x is a prefix of y nor y is a prefix of x, then S has finite delay < min( I x I, |y I ) ( z I denotes the length of the string z). Thus, suppose x is a prefix of y and let k > I be the largest integer such that xk is a prefix of y. Let u be the string such that y = xku (u $ e, the null string, else yx = xy and fx,yl is not a UD code). By the maximality of k, x is not a prefix of u. Let m 1 = |x|, m2 = Iy I. Suppose S does not have finite delay. Then for every integer n > 0 there is a message w e S* such that the first codeword in w's unique factorization cannot be determined until a prefix of w of length greater than n is examined. Particularly, there is a string whose prefix of length m I + m2 initially has two factori- zations, the first begining with y and the second beginning with x. The first factorization of the prefix of length m I + m2 is yx since it begins with y, there are only two codewords (x and y), andy = xku. Because the second factorization begins with x and y = xku, this means that the prefix of length m I (k + 1) of the second factorization is xk+1. By choice of u, x is not a prefix of u so that xk+l being a prefix of yx = xkux implies that u is a prefix of x. Let x = uv (v 5 e else xy = yx). Now we have the following. The first factorization of the prefix of length m1 + m2 is yx = xkux = xkuuv; and the second factorization of the prefix of length ml + m2 is Xk+1U = xkuvu. Thus xkuuv = xkuvu and by cancellation uv = vu. This implies xy = uv(uv)ku = (uv)kuVu = (uv)kuuv = yx which contradicts the hypothesis that Ix,y) is a UD code. a Theorem 1 can also be obtained as a corollary to a result of Linna [6] which says-that any UD code with infinite delay is contained in the message set of a UD code with finite delay that has strictly fewer code words. Lentin and Schiutzenberger (Corollary 1 of [4]) have established the following fact. Proposition: A necessary and sufficient condition for two strings x and y to be powers of the same string is that xy and yx contain a common left prefix of length Ix I + IY I- gcd(Ix I, Iy). 0018-9340/80/0400-0324$00.75 ©) 1980 IEEE 324