Masking the Energy Behavior of DES Encryption H. Saputra, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, R. Brooks , S. Kim and W. Zhang Computer Science and Engineering, Applied Research Lab The Pennsylvania State University Emails: {saputra, vijay, kandemir, mji, sookim, wzhang}@cse.psu.edu, rrb5@only.arl.psu.edu Abstract Smart cards are vulnerable to both invasive and non-invasive attacks. Specifically, non-invasive attacks using power and timing measurements to extract the cryptographic key has drawn a lot of negative publicity for smart card usage. The power measurement techniques rely on the data-dependent energy behavior of the underlying system. Further, power analysis can be used to identify the specific portions of the program being executed to induce timing glitches that may in turn help to bypass key checking. Thus, it is important to mask the energy consumption when executing the encryption algorithms. In this work, we augment the instruction set architecture of a simple five-stage pipelined smart card processor with secure instructions to mask the energy differences due to key-related data- dependent computations in DES encryption. The secure versions operate on the normal and complementary versions of the operands simultaneously to mask the energy variations due to value dependent operations. However, this incurs the penalty of increased overall energy consumption in the data-path components. Consequently, we employ secure versions of instructions only for critical operations; that is we use secure instructions selectively, as directed by an optimizing compiler. Using a cycle-accurate energy simulator, we demonstrate the effectiveness of this enhancement. Our approach achieves the energy masking of critical operations consuming 83% less energy as compared to existing approaches employing dual rail circuits. 1. INTRODUCTION With increased usage of smart cards, the financial incentive for security attacks becomes attractive. For example, the smart card usage in North America surged by 37% in 2000, particularly in the financial segment where security is a prime issue. There are various classes of security attacks that can be broadly classified as: microprobing, software attacks, eavesdropping and fault generation [15]. Microprobing is an invasive technique that involves physical manipulation of the smart card circuit after opening the package. Software attacks focus on protocol or algorithm weaknesses, while eavesdropping techniques hack the secret keys and information by monitoring the power consumption, electromagnetic radiation and execution time. The fault generation techniques induce an intentional malfunction of the circuit by using techniques such as varying the supply voltage, inducing clock glitches, exposing the circuit to ionizing radiation, etc. Our focus in this work is on addressing the information leak due to eavesdropping power profile. * This material is based on work supported in part by the Office of Naval Research under Award No. N00014-01-1-0859, MARCO 98-DF-600 GSRC Award, NSF Awards No. 0093082, 0093085, 0082064, 0103583. Any opinions, findings, and conclusions or recommendations expressed in this presentation are those of the authors and do not necessarily reflect the views of the Office of Naval Research. Power analysis attack is based on analyzing the power consumption of an operation. The main rationale behind this kind of an attack is that the power consumption of an operation depends on the inputs (in the case of cryptography, the inputs are plaintext and secret key). The differences in values of the operands being operated on result in different switching activities in the memory, buses, datapath units (adders, multipliers, logical units) and pipeline registers of the smart card processor. Among these components, the processor datapath and buses exhibit more data-dependent energy variation as compared to memory components [16]. There are different degrees of sophistication involved in such power analysis based attacks. Simple Power Analysis (SPA) [7] uses only a single power consumption trace for an operation. From this power trace, an attacker can identify the operations being performed (such as whether a branch at point p is taken or not or whether an exponentiation operation is performed). Using this power information and by knowing the underlying algorithm being implemented, such information can reveal the secret key. For example, when a branch is taken based on a particular bit of a secret key being zero, the attacker can identify this bit by monitoring the power consumption difference between a taken and not taken branch. Protecting against this type of simple attack can be achieved fairly easily by restructuring the code. For example, a restructured algorithm is provided in [3] to eliminate branch conditions that were initially revealing the secret key information. Also, techniques that randomly introduce noise into the power measurement can mislead simple power analysis. An example of such technique involves adding dummy modules and activating them at random time intervals. These modules will consume additional power skewing the original power profile. However, such techniques only provide protection from straightforward hacking techniques. Higher-order power analysis techniques can be used to circumvent these protection mechanisms. Differential power analysis (DPA) is currently the most popular higher-order power analysis. This scheme utilizes power profiles gathered from several runs and relies on the data-dependent power consumption variation to break the key [7]. In [5], Goubin et al. show how the secret key is guessed by using 1000 sample inputs and their corresponding 1000 power consumption traces. Then, a mean of all these power consumption traces represented as M is obtained. Next, the hacker guesses a particular key and based on the input determines a theoretical value for one of the intermediate bits (b) generated by the program. Then, the outcome of this bit is used to separate the 1000 inputs into two groups (G1 and G2) based on whether b=0 or b=1. If the mean of the power profiles in Group G1 is significantly different from that of M, this indicates that the guess was correct. This difference is a manifestation of the consequent downstream computational power differences that used the bit b. As evident from the above discussion, random noises in power measurements can be filtered through the averaging process using a