Improved Area-Time Tradeoffs for Field Multiplication Using Optimal Normal Bases J. Adikari, A. Barsoum, M.A. Hasan, A.H. Namin, and C. Negre Abstract—In this paper, we propose new schemes for subquadratic arithmetic complexity multiplication in binary fields using optimal normal bases. The schemes are based on a recently proposed method known as block recombination, which efficiently computes the sum of two products of Toeplitz matrices and vectors. Specifically, here we take advantage of some structural properties of the matrices and vectors involved in the formulation of field multiplication using optimal normal bases. This yields new space and time complexity results for corresponding bit parallel multipliers. Index Terms—Binary field, optimal normal basis, Toeplitz matrix, block recombination Ç 1 INTRODUCTION NORMAL bases for finite fields representation were first proposed in 1888 by Kurt Hensel. In 1989, Mullin et al. discovered two classes of low complexity normal bases known as Optimal Normal Bases (ONB) of type I and II [13]. These two types of bases have been shown to be better than any other types of normal bases for creating area efficient multipliers suitable for cryptographic applications [13]. In the literature, there exist a number of bit parallel normal basis multipliers with quadratic ðOðn 2 ÞÞ space complexity, e.g., [12], [9], [10]. Recently much attention has been paid to the design of architectures with subquadratic space complexity. In 2001, Leone presented in [11] the first architecture offering subqua- dratic space complexity for type I ONB via canonical basis multiplication using all-one polynomials. In [4], Fan and Hasan presented a Toeplitz matrix-vector product scheme to design subquadratic complexity multiplier for both type I and II ONB. In [15], von zur Gathen, Shokrollahi and Shokrollahi have proposed to perform the multiplication in ONB-II using a conversion to a polynomial basis and then applying any suitable method for polynomial multiplication. Due to their very efficient basis conversion process, their method outperforms the Fan-Hasan multiplier with regard to space complexity. This method has been slightly improved by Bern- stein and Lange [2]. Recently, the Hasan et al. [8] have proposed a modification of the Toeplitz matrix multiplier of Fan and Hasan. This method which the authors refer to as block recombination, uses decomposi- tion of the multiplier of Fan and Hasan in different blocks, and recombines them in a special way to reduce the space complexity. In this work, we further study the subquadratic complexity multiplication using ONB. We first show that the block recombi- nation method can be used to design multipliers with better space complexity in comparison to Fan-Hasan ONB multipliers. We investigate vector and matrix symmetry properties that exist in the matrix vector expression of ONB multiplication. The vector symmetry property enables us to formulate the multiplication in ONB type II as a sum of two Toeplitz matrices multiplied by a vector and its reverse. The matrix symmetry property reveals the fact that the Toeplitz matrices used in ONB multiplication are made of a number of smaller Toeplitz matrices, some of which are transpose of each other. Our proposed ONB-II multiplier has a space complexity slightly higher than the multiplier based on the Bernstein-Lange work [2], but it has only about half of the gate delay. This makes the proposed multiplier to have an improved hardware efficiency in terms of the number of multiplications per second per unit area. The remainder of this paper is organized as follows: in Section 2, we briefly recall Toeplitz matrix-vector products method. In Section 3, we apply the block recombination method to ONB-I multiplication. In Section 4, we present the block recombination for the two-way split ONB-II multiplier based on vector and matrix symmetry properties. We then compare the complexity of our proposed schemes to recently published methods for field multiplication that use optimal normal bases (Section 5). In Section 6, we present some results for hardware and software implementations and give some concluding remarks. Remark 1. In the sequel, we primarily focus on parallel hardware design of ONB multipliers. The space complexity of such a multiplier corresponds to the number of AND gates (denoted as S  ) and the number of XOR gates (denoted as S  ). The delay (i.e., computation time) of the multiplier, corresponding to the critical path of the circuit, is expressed in terms of the delay of an XOR gate (D X ) and the delay of an AND gate (D A ). 2 REVIEW OF TWO-WAY AND THREE-WAY FORMULA FOR TOEPLITZ MATRIX VECTOR PRODUCT A Toeplitz matrix is an n  n matrix T ¼½t i;j ; i; j ¼ 0; 1;  ;n  1 such that t i;j ¼ t i1;j1 . In this scheme if 2jn, one can use a two- way split formula shown in the upper part of Table 1. This models a Toeplitz matrix-vector product of size n by three Toeplitz matrix-vector products of size n=2 [3]. In a similar fashion, if 3jn one can use the three-way split shown in the right side of the same table which expresses a Toeplitz matrix- vector product of size n as six Toeplitz matrix-vector products of size n=3 each [3]. Using the approach of [8], Fan-Hasan multiplier architecture can be decomposed in different blocks: the component matrix formation (CMF), the component vector formation (CVF), the component multiplication (CM), and the reconstruction (R). The CMF and the CVF blocks recursively compute the smaller size matrices and vectors. For example, in the two-way split case, we define recursively CMF 2 ðT Þ¼½CMF 2 ðT 1 þ T 0 Þ; CMF 2 ðT 1 Þ; CMF 2 ðT 2 þ T 1 Þ and CVF 2 ðV Þ¼½CVF 2 ðA 1 Þ; CVF 2 ðA 0 þ A 1 Þ; CVF 2 ðA 0 Þ: The component multiplication CM consists of n log 2 ð3Þ (resp. n log 3 ð6Þ ) bitwise multiplication for two-way (resp. three-way) split multiplier. The reconstruction block R converts the component representation of the product ^ C to its vector representation C. In [8], the authors have computed the complexity of each block for the two-way split and the three-way split approaches. These complexities are given in Table 2. 3 ONB-I MULTIPLIER 3.1 ONB-I Multiplier and TMVP For the binary field IF 2 n , a normal basis has the following form f;  2 ; 4 ; ... ; 2 n1 g; IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 1, JANUARY 2013 193 . J. Adikari, A. Barsoum, M.A. Hasan, and A.H. Namin are with the ECE Department and CACR, University of Waterloo, Ontario, Canada. E-mail: {jithra.adikari, afekry}@uwaterloo.ca, ahasan@sisr.uwaterloo.ca, anamin@engmail.uwaterloo.ca. . C. Negre is with Team DALI, Universite´ de Perpignan and LIRMM, Universite´ Montpellier 2, France, and the ECE Department and CACR, University of Waterloo, Ontario, Canada. E-mail: christophe.negre@univ-perp.fr. Manuscript received 3 Jan. 2011; revised 14 Sept. 2011; accepted 17 Sept. 2011; published online 1 Oct. 2011. For information on obtaining reprints of this article, please send e-mail to: tc@computer.org, and reference IEEECS Log Number TC-2011-01-0002. Digital Object Identifier no. 10.1109/TC.2011.198. 0018-9340/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society