IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 11, NOVEMBER 2007 4293 On the Nonlinear Complexity and Lempel–Ziv Complexity of Finite Length Sequences Konstantinos Limniotis, Nicholas Kolokotronis, Member, IEEE, and Nicholas Kalouptsidis, Senior Member, IEEE Abstract—The nonlinear complexity of binary sequences and its con- nections with Lempel–Ziv complexity is studied in this paper. A new recursive algorithm is presented, which produces the minimal nonlinear feedback shift register of a given binary sequence. Moreover, it is shown that the eigenvalue profile of a sequence uniquely determines its nonlinear complexity profile, thus establishing a connection between Lempel–Ziv complexity and nonlinear complexity. Furthermore, a lower bound for the Lempel–Ziv compression ratio of a given sequence is proved that depends on its nonlinear complexity. Index Terms—Compression, cryptography, eigenvalue, Lempel–Ziv complexity, nonlinear complexity, nonlinear feedback shift registers, sequences. I. INTRODUCTION Binary sequences have a significant role in numerous applications, amongst others error control coding, spread spectrum communications and cryptography [1]–[3]. In particular, the security of cryptographic systems is strongly contingent on the unpredictability or pseudoran- domness of the key streams [3]. Depending on the cryptographic ap- plication, a sequence is required to present many properties in order to be considered as pseudorandom. The nonlinear complexity of a se- quence , also called maximum order complexity or simply complexity, is an important cryptographic measure; it is defined as the length of the shortest feedback shift register (FSR) that generates . For a linear feedback shift register (LFSR), the corresponding complexity measure is referred to as linear complexity or linear span of . The computa- tion of the minimal LFSR that generates is efficiently solved by the Berlekamp–Massey algorithm (BMA) [4], [5]. Linear complexity has been widely studied in the literature using many different approaches [6]–[12]. On the contrary, the general case of nonlinear complexity has not been studied to the same extent. In [13], [14], a directed acyclic word graph (DAWG) is used to exhibit the complexity profile of sequences over arbitrary fields. The problem of computing the minimal nonlinear FSR that generates a given set of sequences is studied in [15]. An approximate probability distribution for the nonlinear complexity of random binary sequences is derived in [16]. Recent results are provided in [17], where the minimal nonlinear FSR generating a given sequence is computed via an algorithmic approach, and [18] where the special case of a quadratic feedback function of the FSR is treated. More re- cently, constructions of sequences with prescribed linear complexity achieving the maximum possible nonlinear complexity are provided in [19]. Several other pseudorandomness measures of sequences have been proposed. The so-called Lempel–Ziv complexity of a sequence is intro- duced in [20], which is related to the number of cumulatively distinct Manuscript received January 8, 2007; revised May 29, 2007. The material in this correspondence in part was presented at the International Conference on Sequences and Their Applications, Beijing, China, September 24–28, 2006. The authors are with the Department of Informatics and Telecommunications, National and Kapodistrian University of Athens, University Campus, 15784 Athens, Greece (e-mail: klimn@di.uoa.gr; nkolok@di.uoa.gr; kalou@di.uoa. gr). Communicated by G. Gong, Associate Editor for Sequences. Digital Object Identifier 10.1109/TIT.2007.907442 patterns in the sequence. The authors also defined the eigenvalue pro- file as alternative for evaluating the complexity of a sequence, which is strongly connected to the Lempel–Ziv complexity. The parsing proce- dure presented therein, in order to determine Lempel–Ziv complexity, is the basis for the prominent Lempel–Ziv compression algorithm, and in particular, versions LZ77 and LZ78 that are proposed in [21] and [22], respectively. They are both asymptotically optimal, since the compression ratio approaches the source entropy for all finite-alphabet stationary ergodic sources [22], [23]. However, the compression ratio of a finite length sequence can be far from optimal. Because a se- quence cannot be considered as pseudorandom if it can be significantly compressed, the compressibility of a sequence, defined as the degree to which it can be compressed, constitutes an important cryptographic measure. The relationship between several of the currently established cryp- tographic criteria still remains an open problem. In this paper we focus on the connections between the nonlinear and Lempel–Ziv complexity, motivated by a statement of Niederreiter [24]. For any periodic bi- nary sequence, we establish the dependence of the minimum achiev- able compression ratio on its nonlinear complexity by deriving a lower bound depending on the nonlinear complexity; this improves the one presented in [25]. Furthermore, a special class of highly compressible sequences with prescribed nonlinear complexity is introduced and an- alyzed, emphasizing the importance of compressibility as a pseudo- randomness measure. For sequences over arbitrary fields, a connection between the nonlinear complexity profile and its eigenvalue profile is derived. A new recursive algorithm producing the minimal FSR of a binary sequence is developed, generalizing the Berlekamp–Massey al- gorithm to the nonlinear case. The proposed algorithm is highly more efficient than the one given in [17] as it computes recursively the min- imal FSR of any subsequence by applying Boolean algebra arguments. The paper is organized as follows. In Section II the basic terminology and definitions are introduced. Properties of the nonlinear complexity of sequences over any field are presented in Section III. In Section IV, we explore the relationship between the eigenvalue profile and the non- linear complexity profile of finite length sequences over any field, and establish a connection between Lempel–Ziv and nonlinear complexity. Based on the properties of the nonlinear complexity profile, a recur- sive algorithm that computes the minimal FSR of any given binary se- quence is derived in Section V. The relationship among the nonlinear complexity and Lempel–Ziv compression ratio is derived in Section VI. Finally, concluding remarks are given in Section VII. II. PRELIMINARIES Let denote the binary field. A boolean function with variables is a mapping . The complement of a binary variable will be denoted by , where “ ” represents addition modulo . For any boolean function a minterm is defined as , where , and [26]. Hence, there are minterms, each associated with a specific vector or -tuple . Note that each is uniquely determined by the property that it evaluates to if its th variable is replaced by , e.g., the minterm of the -tuple is . There are several ways to represent a boolean function. The Alge- braic Normal Form (ANF) of is defined as (1) where the sum is taken modulo , , while and . Any boolean function with variables can also be represented in its Disjunctive Normal Form (DNF), which is defined as the sum 0018-9448/$25.00 © 2007 IEEE