Addition-based exponentiation modulo 2 k A. Fit-Florea, D.W. Matula and M.A. Thornton A novel method for performing exponentiation modulo 2 k is described. The algorithm has a critical path consisting of k dependent shift-and-add modulo 2 k operations. Although 3 is the preferred exponent base, the algorithm can be extended easily in order to perform the general binary powering operation. Introduction and background: The basic integer arithmetic opera- tions of addition=subtraction, multiplication and division are imple- mented typically in hardware using k bits of precision with k usually 16, 32, or 64, and up to 1024 in the case of cryptography. Having a precision limited to k bits makes the arithmetic operations equivalent to their corresponding residue arithmetic modulo 2 k operations along with appropriate overflow handling. When the hardware support does not include a large multiplier, there is a particular need for additive bit-serial algorithms for these and additional residue operations. In this Letter we present a bit-serial algorithm for the fundamental residue arithmetic operation of powering (or exponentiation). Follow- ing [1] we herein employ jnj 2 k ¼ j to denote the congruence relation n j (mod 2 k ) with the residue j satisfying 0 j 2 k 1. When computing the exponentiation operation b e (mod 2 k ) of a basis b (our preferred case is b ¼ 3), usually some variation of the square-and- multiply algorithm is being employed. In this method the squaring operation is performed sequentially obtaining j3 2 1 j 2 k, j3 2 2 j 2 k, j3 2 3 j 2 k, ... , j3 2 k1 j 2 k. From these residues a subset is selected to be part of the product corresponding to j3 e j 2 k: j3 e j 2 k ¼ 3 P i2Be ð2 i Þ 2 k ¼ Q i2B e 3 2 i 2 k ¼ Q i2B e j3 2 i j 2 k 2 k ð1Þ The exponent e is expressed as a sum of powers of 2 reflecting its binary representation, and B e is the set of weights for the 1 digits in the binary representation of e. For example B 19 ¼ {0, 1, 4} since 19 ¼ 2 0 þ 2 1 þ 2 4 . Using a square-and-multiply method, O(k) squaring and O(k=2) multiplications modulo 2 k are to be performed in the worst case [2]. Storing in a k-entries lookup table the results of the squaring operations j3 2 i j 2 k reduces the computations needed to O(k=2) multiplication modulo 2 k . In the following we present a method that virtually replaces each multiplication with one shift and two concurrent add modulo 2 k operations, thus having the potential to improve a hardware implemen- tation in both area and time over a square-and-multiply method implementation. Relevant algebraic properties: We note the fact that the exponentia- tion modulo 2 k is cyclic with period 2 k2 [3], hence we consider w.l.g. the exponents e to be in the range 0, 1, ... , (2 k2 1). The algebraic property that makes possible expressing any exponent e as a sum of powers of 2 is the fact that {2 i :0 i <(k 2)} is a basis for the additive group of residues e modulo 2 k2 . Decomposing e as a sum of elements of another basis still produces a correct result. In the following we present such a basis and show that using it has the advantage of eliminating the need for multiplications when computing the exponentiation modulo 2 k . We denote the discrete logarithm modulo 2 k with logarithmic base 3 of A (in case it exists) by dlg(A). This simply represents the exponent e such that 3 e is congruent with (A mod 2 k ). That is: jAj 2 k ¼j3 dlg(A) j 2 k. For more details the reader is referred to [3]. Also from [3], we mention the following result: Lemma 1: Let r be a residue modulo 2 k of the form r ¼ 1 þ 2 i þ 2 iþ1 R; 2 < i < k ; 0 R < 2 ki1 ð2Þ Its corresponding discrete logarithm dlg( r) is then of the form dlgð rÞ¼ 2 i2 þ d r 2 i1 ; for some d r ; 0 d r < 2 ki1 ð3Þ We use t i to denote what we call the two-ones residues modulo 2 k : t i ¼j2 i þ 1j 2 k. The following observation comes as a direct consequence of Lemma 1. Observation 1: The discrete logarithm of two-ones residues t i is of the form: dlgðt i Þ¼ 2 i2 þ 2 i1 y i ; 2 < i < k ; for some y i ; 0 y i < 2 ki1 In Table 1 we show the two-ones residues and their corresponding discrete logarithms for k ¼ 8. As it can be inferred directly from Observation 1, the set BT ¼ {dlg(t i ): i ¼ 1, 3, 4, ... ,(k 1)} repre- sents a basis for residues e,0 e <2 k2 , in the sense that, again, any exponent e can be represented as a sum of elements from BT . Consequently, j3 e j 2 k can be expressed as a product: j3 e j 2 k ¼ 3 P i2b e dlgðt i Þ 2 k ¼ Q i2b e j3 dlgðt i Þ j 2 k 2 k ¼ Q i2b e ð2 i þ 1Þ 2 k ¼ Q i2b e t i 2 k ð4Þ for a set b e of indices unique to any e. Once the set b e is known, j3 e j 2 k can be computed as a product of two-ones residues. Multiplying by t i ¼ (2 i þ 1) has the advantage that it can be performed as a modulo 2 k shift-and-add operation: A t i :¼ A þ A (i), thus eliminating the need for a multiplier. In the following we show an algorithm for selecting the elements of sets b e in a serial fashion. Table 1: Two-ones discrete log table for k ¼ 8 i t i dlg(t i ) 1 0000 0011 00 0001 3 0000 1001 00 0010 4 0001 0001 11 0100 5 0010 0001 10 1000 6 0100 0001 01 0000 7 1000 0001 10 0000 Exponentiation modulo 2 k algorithm: Stimulus: An exponent e (modulo 2 k2 ). Response: j3 e j 2 k. Method: L1: P :¼ 1; je 0 j :¼ e; L2: if (e 0 0 ¼ 1) then P :¼ 11; je 0 j :¼ e 0 1; L3: for i from 1 to (k 3) do L4: if (e 0 i ¼ 1) then L5: e 0 :¼je 0 dlg(t iþ2 )j 2 k2; L6: P :¼jP þjP (i þ 2)j 2 kj 2 k; L7: Result: P . The initialisation is performed in lines L1 and L2. The product P is set to either 1 or 11 (corresponding to e ¼ 0 or e ¼ 1). The working variable exponent e 0 is always set in such a way that P corresponds to 3 raised at exponent (e e 0 ) and the least significant i digits of e 0 are all 0s. The algorithmic step of lines L3 L6 is updating e 0 by subtracting dlg(t i þ 2), the exponent of t i ¼ (2 i þ 1), and the product P to reflect the changes in exponent, P :¼ P (2 iþ2 þ 1). Eventually, after (k 2) steps, e 0 becomes 0 and the ‘product’ P corresponds to j3 e0 j 2 k ¼j3 e j 2 k. The values dlg(t iþ2 ) can be computed beforehand (e.g. using the algorithm described in [3]), and stored in a lookup table of uncompressed size (k 2) 2 bits. Fig. 1 Iterative loop for L3L6 of algorithm 1 ELECTRONICS LETTERS 20th January 2005 Vol. 41 No. 2