IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 10, OCTOBER 2010 5223
Correcting Deletions Using Linear and Cyclic Codes
Khaled A. S. Abdel-Ghaffar, Member, IEEE, Hendrik C. Ferreira, Senior Member, IEEE, and
Ling Cheng, Associate Member, IEEE
Abstract—Linear and cyclic codes are typically used to combat
substitution errors. However, synchronization errors, associated
with the deletion and insertion of symbols, can cause severe per-
formance degradation unless the coding scheme possesses the ca-
pability to recover from such errors. It is shown that linear codes
of rate greater than 1/2 cannot correct deletion or insertion errors
but there are linear codes of rate 1/2 that can correct these er-
rors. Although cyclic codes, except for repetition codes, cannot cor-
rect deletion or insertion errors, two approaches are investigated
to yield codes, based on cyclic codes, that can correct these errors.
In the first approach, it is shown that a binary or nonbinary cyclic
code of rate at most 1/3 or 1/2, respectively, can be extended by
one symbol to make it capable of correcting synchronization er-
rors. In the second approach, a cyclic code of rate at most 1/2 is
expurgated by appropriately deleting codewords such that the ex-
purgated code is capable of correcting synchronization errors. It is
shown that deleting codewords costs at most two information bits
if the code is binary and one information symbol if the code is non-
binary.
Index Terms—Cyclic code, deletion, expurgated code, extended
code, insertion, linear code, substitution error, synchronization
error.
I. INTRODUCTION
I
N most communication and storage channels, substitution
errors, in which a transmitted symbol is received as an-
other symbol, are the most common type of errors. For this
reason, coding techniques are widely used to combat such er-
rors. However, channels may also suffer from synchronization
errors. These errors are associated with not receiving a trans-
mitted symbol, which is called a deletion error, or with receiving
a spurious symbol that was not transmitted, which is called an
insertion error. In some applications, such as the Internet, sym-
bols, representing packets, are transported over a communica-
tion network via a set of links and nodes connecting the source
to the destination. A failure in any part of the communication
route may cause a packet to be lost causing a deletion error. The
rate of packet loss ranges from 0.6% to 1.4% depending on the
distance from server to user [13]. It has been also observed that
Manuscript received May 26, 2009; revised May 13, 2010. Date of current
version September 15, 2010. This work was supported in part by the National
Science Foundation (NSF) under Grant CCF-0727478 and in part by the Na-
tional Research Foundation (NRF) under Grant 66422. The material in this
paper was presented in part at the IEEE International Symposium on Informa-
tion Theory, Nice, France, June 24–29, 2007.
K. A. S. Abdel-Ghaffar is with the Department of Electrical and Com-
puter Engineering, University of California, Davis, CA 95616 USA (e-mail:
ghaffar@ece.ucdavis.edu).
H. C. Ferreira and L. Cheng are with the Department of Electrical and Elec-
tronic Engineering Science, University of Johannesburg, Auckland Park, 2006,
South Africa (e-mail: hcferreira@uj.ac.za; lcheng@uj.ac.za).
Communicated by M. Blaum, Associate Editor for Coding Theory.
Digital Object Identifier 10.1109/TIT.2010.2059790
the loss rate can be much higher, ranging between 10% and 50%,
over short periods of time [15]. Deletion and insertion errors can
have a devastating effect on the reliability of the communication
channel even if powerful codes are used to correct substitution
errors. Therefore, there is a compelling reason to consider codes
that, not only correct substitution errors, but can also recover
from deletion and insertion errors [3], [4], [8]–[10], [14], [18],
[21], [23]. For an interesting and accessible survey on deletion
correcting codes, the reader is referred to [20].
In this paper, we show that linear codes of rates greater than
1/2 cannot correct a single deletion or a single insertion although
there are linear codes of rate 1/2 that can correct such errors.
This contradicts a construction by Sloane [20] of linear dele-
tion correcting codes of rate greater than 1/2. Actually, we will
show that the construction presented in [20] cannot lead to linear
codes. Our results addresses a question raised by Sloane [20] re-
garding optimal linear single deletion correcting codes. In par-
ticular, we determine the minimum number of check symbols
needed in a linear deletion correcting code. For example, using
computer search, it is reported in [20] that there is no bi-
nary linear deletion correcting code. This follows immediately
from our results.
Not only all linear codes of rates greater than 1/2 are inca-
pable of correcting synchronization errors, but also all cyclic
codes, except for repetition codes, cannot correct these errors.
This is unfortunate since cyclic codes are the most widely used
class of codes for correcting substitution errors due to the ease of
their implementation. This reason motivated us to study coding
schemes, based on cyclic codes, that can correct deletions and
insertions. In particular, we study extending and expurgating
cyclic codes for this purpose. Our results pertain to low-rate
cyclic codes. We show that by judiciously extending a cyclic
code by one symbol, i.e., inserting one extra symbol in each
codeword, we obtain a code capable of correcting synchroniza-
tion errors provided that the cyclic code has rate at most 1/3 or
1/2 depending, respectively, on whether or not the code is binary.
We also consider expurgating a cyclic code, i.e., deleting
codewords from it, such that the resulting expurgated code
is capable of correcting synchronization errors. For a cyclic
code of rate 1/2 or less, we determine the maximum size of
an expurgated code with this error correcting capability. We
show that deleting codewords from a cyclic code of rate 1/2
or less to obtain an expurgated code that can correct deletions
and insertions costs at most two information bits if the code is
binary and one information symbol if the code is nonbinary.
In this paper, we assume that the beginning and the end of
each received sequence corresponding to a transmitted code-
word are known, which allows for independent decoding of the
codewords. (This assumption, which is commonly assumed in
the literature, can be achieved by inserting periodic markers be-
tween codewords.) We also assume that each codeword may
0018-9448/$26.00 © 2010 IEEE