High-Performance Integer Factoring with
Reconfigurable Devices
Ralf Zimmermann, Tim Güneysu and Christof Paar
Horst Görtz Institute for IT-Security, Ruhr-University Bochum, Germany
Email: {zimmermann,gueneysu,cpaar}@crypto.rub.de
Abstract—We present a novel FPGA-based implementation
of the Elliptic Curve Method (ECM) for the factorization of
medium-sized composite integers. More precisely, we demon-
strate an ECM implementation capable to determine prime
factors of up to 2,424 151-bit integers per second using a
single Xilinx Virtex-4 SX35 FPGA. Using this implementation
on a cluster like the COPACOBANA is beneficial for attacking
cryptographic primitives like the well-known RSA cryptosystem
with advanced methods such as the Number Field Sieve (NFS).
To provide this vast number of integer factorizations per
FPGA, we make use of the available DSP blocks on each Virtex-
4 device to accelerate low-level arithmetic computations. This
methodology allows the development of a time-area efficient
design that runs 24 ECM cores in parallel, implementing both
phase 1 and phase 2 of the ECM. Moreover, our design is fully
scalable and supports composite integers in the range from 66 to
236 bits without any significant modifications to the hardware.
Compared to the implementation by Gaj et al., who reported
an ECM design for the same Virtex-4 platform, our improved
architecture provides an advanced cost-performance ratio which
is better by a factor of 37.
Index Terms—Factorization, elliptic curve method, reconfig-
urable hardware, COPACOBANA.
I. I NTRODUCTION
In 1987, the Elliptic Curve Method (ECM) was introduced
by H. W. Lenstra [1] as a new method for integer factorization,
generalizing the concept of Pollard’s p − 1 and Williams’ p +1
method [2], [3]. Although the ECM is known not to be the
fastest method for factorization with respect to asymptotical
time complexity, it is widely used to factor composite numbers
up to 200 bits due to its very limited requirements on memory.
The most prominent application that relies on the hardness
of the factorization problem is the RSA cryptosystem. An
attacker on RSA has to find the factorization of a composite
number n which consists of two large primes p, q. More
precisely, the RSA security parameter n is larger than 1024
bits and hence out of reach of the ECM. Up to date, such
large bit sizes are preferably attacked with the most powerful
methods known so far, such as the Number Field Sieve (NFS).
However, the complex NFS
1
involves the search of relations
in which many mid-sized numbers need to be tested if they
are "smooth", i.e., composed only of small prime factors not
1
The NFS comprises of four steps, the polynomial selection, relation
finding, a linear algebra step and finally the square root step. The relation
finding step is most time-consuming, taking roughly 90% of the runtime. For
more information on the NFS refer to [4].
larger than a fixed boundary B. In this context, ECM is an
important tool to determine the smoothness of such integers
(i.e., if they can be factored into small primes), in particular
due to its moderate resource requirements.
The fastest ECM implementations for retrieving factors
of composite integers are software-based; a state-of-the-art
system is the GMP-ECM software published by P. Zimmer-
mann et al. [5] and has been extended for use with GPUs
by Bernstein et al. [6]. As a promising alternative, efficient
hardware implementations of the ECM were first proposed in
2005: Šimka et al. [7] demonstrated the feasibility to imple-
ment the ECM in reconfigurable hardware by presenting a first
proof-of-concept implementation. Their results were improved
by Gaj et al. [8], [9], who also showed a complete hardware
implementation of ECM phase 2. However, the low-level arith-
metic in these implementations were only implemented using
straightforward techniques within the configurable logic which
yet leaves room for further improvements. To fill this gap,
de Meulenaer et al. [10] proposed an unrolled Montgomery
multiplier based on a two-dimensional pipeline on Xilinx
Virtex-4 FPGAs to accelerate the field arithmetic. However,
due to limitations in area and the long pipeline design, their
design only efficiently supports the first phase of the ECM.
Contribution: In this work we propose a novel ECM archi-
tecture for Xilinx Virtex FPGAs making use of DSP blocks
for the computationally intensive arithmetic. Our focus is to
accelerate the underlying field arithmetic of the ECM on
FPGAs without sacrificing the option to combine both phase
1 and 2 in a single core. Thus, we adopt some high-level
decisions like memory-management and the use of SIMD
instructions from [8] which also supports both phases on the
same hardware. To improve the field arithmetic, we place
fundamental arithmetic functions like adders and multipliers
in embedded DSP blocks of modern FPGAs. For factoring
large amounts of numbers, we finally describe our factor-
ization setup based on a variant of COPACOBANA (Cost
Optimized PArallel COde Breaker) - a cluster system based
on FPGAs [11], [12].
Outline: We start with a short review on the mathematical
background and the concept of the ECM. In Section III
we first describe the cluster system COPACOBANA, which
represents the target platform of our work, and then discuss the
architecture of an ECM core and its corresponding arithmetic
components. Finally, we present our factorization results in
Section IV before we conclude with Section V.
2010 International Conference on Field Programmable Logic and Applications
978-0-7695-4179-2/10 $26.00 © 2010 IEEE
DOI 10.1109/FPL.2010.26
83
2010 International Conference on Field Programmable Logic and Applications
978-0-7695-4179-2/10 $26.00 © 2010 IEEE
DOI 10.1109/FPL.2010.26
83
2010 International Conference on Field Programmable Logic and Applications
978-0-7695-4179-2/10 $26.00 © 2010 IEEE
DOI 10.1109/FPL.2010.26
83