SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2013; 00:1–18 Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe High-Speed Parallel Implementations of the Rainbow Method Based on Perfect Tables in a Heterogeneous System Jung Woo Kim 1 , Jungjoo Seo 1 , Jin Hong 2 , Kunsoo Park 1 and Sung-Ryul Kim 3 1 Department of Computer Science and Engineering and Institute of Computer Technology, Seoul National University, Seoul 151-747, Korea 2 Department of Mathematical Sciences and ISaC, Seoul National University, Seoul 151-747, Korea 3 Division of Internet and Media, Konkuk University, Seoul 143-701, Korea SUMMARY The computing power of graphics processing units (GPU) has increased rapidly, and there has been extensive research on general-purpose computing on GPU (GPGPU) for cryptographic algorithms such as RSA, ECC, NTRU, and AES. With the rise of GPGPU, commodity computers have become complex heterogeneous GPU+CPU systems. This new architecture poses new challenges and opportunities in high-performance computing. In this paper, we present high-speed parallel implementations of the rainbow method based on perfect tables, which is known as the most efficient time-memory tradeoff, in the heterogeneous GPU+CPU system. We give a complete analysis of the effect of multiple checkpoints on reducing the cost of false alarms, and take advantage of it for load balancing between GPU and CPU. For GTX460, our implementation is about 1.86 and 3.25 times faster than other GPU-accelerated implementations, RainbowCrack and Cryptohaze, respectively, and for GTX580, 1.53 and 2.40 times faster. Copyright c 2013 John Wiley & Sons, Ltd. Received . . . KEY WORDS: GPGPU; CUDA; Heterogeneous Computing; Cryptanalysis; Cryptanalytic Time- Memory Tradeoff; Rainbow Method 1. INTRODUCTION With the GPU’s rapid evolution from a graphics processor to a programmable parallel processor, GPU is a many-core multi-threaded multiprocessor that excels at not only graphics but also computing applications. Today’s GPUs have hundreds of parallel processor cores executing tens of thousands of parallel threads. Using a large number of processors, GPUs are used for accelerating the performance of mathematical and scientific works. General-purpose computing on GPUs (GPGPU) was first introduced in 2006 by unveiling CUDA by NVIDIA [2]. CUDA enables programmers to easily control GPUs by writing programs similar to C. Recently, researchers and developers have enthusiastically adopted CUDA and GPU computing for cryptographic algorithms. In 2007, Manavski et al. efficiently implemented the Advanced Encryption Standard (AES) algorithm using CUDA [3]. In 2008, Szerwinski and G¨ uneysu made use of CUDA for GPGPU processing of asymmetric cryptosystems (RSA, DSA, ECC) [4]. In 2009, Bernstein et al. showed that GPU can be used for cryptanalysis as well as implementation of * Correspondence to: Kunsoo Park, Department of Computer Science and Engineering and Institute of Computer Technology, Seoul National University, Seoul 151-747, Korea. E-mail: kpark@theory.snu.ac.kr This article shares much of its material with our previous work [1], presented at INDOCRYPT 2012. However, this work treats the perfect table case, whereas the previous work covered the non-perfect table case. Copyright c 2013 John Wiley & Sons, Ltd. Prepared using speauth.cls [Version: 2010/05/13 v3.00] This is the "accepted version" of a paper to appear in Software-Practice and Experience. http://dx.doi.org/10.1002/spe.2257