Abstract We describe two novel techniques for multiplying polynomials which help with accelerating popular public key cryptographic algorithms like RSA and key exchange techniques like Elliptic Curve Diffie Hellman. The first technique is based on an algorithm for generating one-iteration Karatsuba-like formulae using graphs. The novelty of our approach lies on the correlation between graph properties (i.e. vertices, edges and sub-graphs) and the Karatsuba-like terms of big number multiplication routines. The second technique is an improvement over the one-iteration extension to Karatsuba proposed by Weimerskirch and Paar [2] that yields better performance when the input polynomials have odd number of coefficients. We present experimental data that show that our techniques boost the performance of public key and key exchange algorithms substantially. 1. Introduction Public key and key exchange algorithms like RSA, DSA and Elliptic Curve Diffie Hellman are based on multiplying polynomials of large degrees. Polynomials may represent integers as in RSA or elements of finite fields as in Elliptic Curve Cryptography (ECC). In both cases knowledge of fast polynomial multiplication techniques helps with accelerating such cryptographic algorithms and protocols. Cryptographic algorithm acceleration is important because it helps with improving the performance of many different networking applications such as e-commerce transactions, on-line banking, storage, virtual private networking, and e-mail authentication and also because current software implementations of public key algorithms consume a substantial number of processor clocks. In this paper we propose two new techniques for fast polynomial multiplication and discuss how they can be applied for improving the performance of public key cryptographic algorithms. The first technique is a novel approach for extending the Karatsuba algorithm [6] to large operand sizes without having to pay the cost of recursion. The second technique is an improvement over the one-iteration extension to Karatsuba proposed by Weimerskirch and Paar [2] that yields better performance when the input polynomials have odd number of coefficients. The Karatsuba algorithm was proposed in 1962 as an attempt to reduce the number of scalar multiplications required for computing the product of two integers or polynomials. The classic algorithm accepts as input two polynomials of degree equal to 1, i.e., a(x) = a 1 x+a 0 and b(x) = b 1 x+b 0 and computes their product a(x)b(x) = a 1 b 1 x 2 +(a 1 b 0 +a 0 b 1 )x+a 0 b 0 using 3 scalar multiplications. This technique is different from the naïve (also called the ‘schoolbook’) way of multiplying polynomials a(x) and b(x) which is to perform 4 scalar multiplications, i.e., find the products a 0 b 0 , a 0 b 1 , a 1 b 0 and a 1 b 1 . Karatsuba showed that you only need to do three scalar multiplications, i.e., you only need to find the products a 1 b 1 , (a 1 +a 0 )(b 1 +b 0 ) and a 0 b 0 . The missing coefficient (a 1 b 0 +a 0 b 1 ) can be computed as the difference ( a 1 +a 0 )(b 1 +b 0 )-a 0 b 0 -a 1 b 1 once scalar multiplications are performed. For operands of larger size, the Karatsuba algorithm can be applied recursively. One of the most important open problems associated with using Karatsuba is how to apply the algorithm to big numbers without having to pay the cost of recursion. Avoiding recursion is important because recursive Karatsuba routines cannot take full advantage of any hardware-level parallelism supported by a processor architecture or hardware acceleration unit. In this document we present novel multiplication techniques, which are one-iteration and involve significantly lesser number of scalar multiplies than other known one- iteration approaches. For 1024-bit and 2048-bit RSA computations our techniques contribute to 35-42% gain as compared to OpenSSL when running on an Intel 3 GHz Core 2 Duo Processor. For the NIST B-233 Elliptic Curve our techniques contribute to 36% gain on the same processor. Finally in a hypothetical next generation processor architecture with 128-bit wide multiplication support in the instruction set, our techniques accelerate RSA by 2.84X. The document is structured as follows. In Section 2 we present related work. In Section 3 we present the intuition behind our approach. In Section 4 we present our first technique and in Section 5 we discuss its correctness. In Section 6 we present our second technique. In Section 7 we discuss the performance of our algorithms. Finally in Section 8 we provide some concluding remarks. Fast Multiplication Techniques for Public Key Cryptography Vinodh Gopal, Satyajit Grover and Michael E. Kounavis Intel Corporation, USA 316