The Performance Analysis and Hardware Acceleration of Crypto-Computations for Enhanced Security Jed Kao-Tung Chang Shaoshan Liu Jean-Luc Gaudiot Dept of Electrical Engineering and Computer Science University of California Irvine, CA 92697, USA {jedc, shaoshal, gaudiot}@uci.edu Chen Liu Department of Electrical and Computer Engineering Florida International University Miami, FL 33174, USA cliu@fiu.edu Abstract—Security is very important in modern life due to most information is now stored in digital format. A good security mechanism will keep information secrecy and integrity, hence, plays an important role in modern information exchange. However, cryptography algorithms are extremely expensive in terms of execution time. To make data not easily being cracked, many arithmetic and logical operations will be executed in the encryption/decryption process with many data movement. This means the cryptographic applications are both computation and memory intensive. Using a general-purpose processor for this scenario would not be very cost-effective. This study addresses this problem. Compared to the previous designs, we used a performance analyzer to identify “hotspot” functions across a set of benchmarks. The hotspot function consumes a substantial amount of the execution time of the specific algorithm. Then we translate these hotspot functions into hardware accelerators to improve the performance. Overall we achieve 34 – 83 folds of speedup. Keywords-security; crypto computations; hardware acceleration; performance analyzer. I. INTRODUCTION Our ultimate-goal is to provide hardware-assisted solutions to improve the performance of the crypto-computations in modern enterprise and IT systems. In this study, we collected nine cryptography algorithms as our benchmarks: AES [3], 3DES, RC5, MD5, IDEA, SHA1, Blowfish, ECC and RSA [4]. The reason for our choice is due to their popularity and their program structure being representative of the contemporary cryptography works [1, 2]. We used VTune [5] as our performance analyzer to examine the hotspot function. VTune analyzes the software performance on IA-32 and Intel64-based machines. It collects performance data of the application running on the host system, organizes and displays the data in an interactive view. VTune’s call graph view provides a tree structure to show the call relationship among all functions. VTune also provides us with the Self Time and Total Time for each function. If we observe that a function’s Total Time is high but Self Time is low, there must be some subroutine of it having high workload. Thus, this would help us identify the “hotspot” functions and the percentage of the hotspot functions occupying the total execution time of each benchmark. Fig. 1 shows the percentage of the hotspot functions of various algorithms. Please note that for execution time, we only consider the crypto-computation (key setup, encryption, and decryption) part of an application, excluding the file I/O or some system calls within the dynamic- link library. Figure 1. Execution Rate of Hotspot Function(s). II. CANDIDATE ALGORITHMS SELECTION For selecting the candidates of our hardware acceleration, we need to consider two aspects of the hotspot function(s): its percentage of total execution time and relationship between the hotspot function(s) and the overall algorithm. The first aspect is obvious. We would like to choose a hotspot function with high execution rate. If the rate is too low, it cannot enhance the performance of the total algorithm significantly. The second aspect is equally important. If a hotspot function is exactly the main process of the overall algorithm, such as the encryption/decryption part, the hardware cost will be too high. Because we need to implement many hardware instructions and accordingly many hardware components and this will occupy too much die area. Thus, a good candidate for us is a hotspot function with high execution rate and small size. Now let’s consider the execution rate, which has been shown clearly in Figure 1. We can see that the execution time of the hotspot functions account for the majority of total execution time for most of the benchmarks. In 3DES, MD5, 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 D ES AES Dec r ypt MD5 AES En c ry p t I D E A Blowf i s h R S A RC5 ECC SHA1