IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 11, NOVEMBER 2007 4293
On the Nonlinear Complexity and Lempel–Ziv Complexity
of Finite Length Sequences
Konstantinos Limniotis, Nicholas Kolokotronis, Member, IEEE, and
Nicholas Kalouptsidis, Senior Member, IEEE
Abstract—The nonlinear complexity of binary sequences and its con-
nections with Lempel–Ziv complexity is studied in this paper. A new
recursive algorithm is presented, which produces the minimal nonlinear
feedback shift register of a given binary sequence. Moreover, it is shown
that the eigenvalue profile of a sequence uniquely determines its nonlinear
complexity profile, thus establishing a connection between Lempel–Ziv
complexity and nonlinear complexity. Furthermore, a lower bound for the
Lempel–Ziv compression ratio of a given sequence is proved that depends
on its nonlinear complexity.
Index Terms—Compression, cryptography, eigenvalue, Lempel–Ziv
complexity, nonlinear complexity, nonlinear feedback shift registers,
sequences.
I. INTRODUCTION
Binary sequences have a significant role in numerous applications,
amongst others error control coding, spread spectrum communications
and cryptography [1]–[3]. In particular, the security of cryptographic
systems is strongly contingent on the unpredictability or pseudoran-
domness of the key streams [3]. Depending on the cryptographic ap-
plication, a sequence is required to present many properties in order
to be considered as pseudorandom. The nonlinear complexity of a se-
quence , also called maximum order complexity or simply complexity,
is an important cryptographic measure; it is defined as the length of
the shortest feedback shift register (FSR) that generates . For a linear
feedback shift register (LFSR), the corresponding complexity measure
is referred to as linear complexity or linear span of . The computa-
tion of the minimal LFSR that generates is efficiently solved by the
Berlekamp–Massey algorithm (BMA) [4], [5]. Linear complexity has
been widely studied in the literature using many different approaches
[6]–[12].
On the contrary, the general case of nonlinear complexity has not
been studied to the same extent. In [13], [14], a directed acyclic word
graph (DAWG) is used to exhibit the complexity profile of sequences
over arbitrary fields. The problem of computing the minimal nonlinear
FSR that generates a given set of sequences is studied in [15]. An
approximate probability distribution for the nonlinear complexity of
random binary sequences is derived in [16]. Recent results are provided
in [17], where the minimal nonlinear FSR generating a given sequence
is computed via an algorithmic approach, and [18] where the special
case of a quadratic feedback function of the FSR is treated. More re-
cently, constructions of sequences with prescribed linear complexity
achieving the maximum possible nonlinear complexity are provided
in [19].
Several other pseudorandomness measures of sequences have been
proposed. The so-called Lempel–Ziv complexity of a sequence is intro-
duced in [20], which is related to the number of cumulatively distinct
Manuscript received January 8, 2007; revised May 29, 2007. The material in
this correspondence in part was presented at the International Conference on
Sequences and Their Applications, Beijing, China, September 24–28, 2006.
The authors are with the Department of Informatics and Telecommunications,
National and Kapodistrian University of Athens, University Campus, 15784
Athens, Greece (e-mail: klimn@di.uoa.gr; nkolok@di.uoa.gr; kalou@di.uoa.
gr).
Communicated by G. Gong, Associate Editor for Sequences.
Digital Object Identifier 10.1109/TIT.2007.907442
patterns in the sequence. The authors also defined the eigenvalue pro-
file as alternative for evaluating the complexity of a sequence, which is
strongly connected to the Lempel–Ziv complexity. The parsing proce-
dure presented therein, in order to determine Lempel–Ziv complexity,
is the basis for the prominent Lempel–Ziv compression algorithm,
and in particular, versions LZ77 and LZ78 that are proposed in [21]
and [22], respectively. They are both asymptotically optimal, since the
compression ratio approaches the source entropy for all finite-alphabet
stationary ergodic sources [22], [23]. However, the compression ratio
of a finite length sequence can be far from optimal. Because a se-
quence cannot be considered as pseudorandom if it can be significantly
compressed, the compressibility of a sequence, defined as the degree
to which it can be compressed, constitutes an important cryptographic
measure.
The relationship between several of the currently established cryp-
tographic criteria still remains an open problem. In this paper we focus
on the connections between the nonlinear and Lempel–Ziv complexity,
motivated by a statement of Niederreiter [24]. For any periodic bi-
nary sequence, we establish the dependence of the minimum achiev-
able compression ratio on its nonlinear complexity by deriving a lower
bound depending on the nonlinear complexity; this improves the one
presented in [25]. Furthermore, a special class of highly compressible
sequences with prescribed nonlinear complexity is introduced and an-
alyzed, emphasizing the importance of compressibility as a pseudo-
randomness measure. For sequences over arbitrary fields, a connection
between the nonlinear complexity profile and its eigenvalue profile is
derived. A new recursive algorithm producing the minimal FSR of a
binary sequence is developed, generalizing the Berlekamp–Massey al-
gorithm to the nonlinear case. The proposed algorithm is highly more
efficient than the one given in [17] as it computes recursively the min-
imal FSR of any subsequence by applying Boolean algebra arguments.
The paper is organized as follows. In Section II the basic terminology
and definitions are introduced. Properties of the nonlinear complexity
of sequences over any field are presented in Section III. In Section IV,
we explore the relationship between the eigenvalue profile and the non-
linear complexity profile of finite length sequences over any field, and
establish a connection between Lempel–Ziv and nonlinear complexity.
Based on the properties of the nonlinear complexity profile, a recur-
sive algorithm that computes the minimal FSR of any given binary se-
quence is derived in Section V. The relationship among the nonlinear
complexity and Lempel–Ziv compression ratio is derived in Section VI.
Finally, concluding remarks are given in Section VII.
II. PRELIMINARIES
Let denote the binary field. A boolean function with variables
is a mapping . The complement of a binary variable will
be denoted by , where “ ” represents addition modulo . For
any boolean function a minterm is defined as
, where , and [26].
Hence, there are minterms, each associated with a specific vector
or -tuple . Note that each is uniquely determined
by the property that it evaluates to if its th variable is replaced by ,
e.g., the minterm of the -tuple is .
There are several ways to represent a boolean function. The Alge-
braic Normal Form (ANF) of is defined as
(1)
where the sum is taken modulo , , while and
. Any boolean function with variables can also be represented
in its Disjunctive Normal Form (DNF), which is defined as the sum
0018-9448/$25.00 © 2007 IEEE