Compiler-Managed Register File Protection for Energy-Efficient Soft Error Reduction * Jongeun Lee, Aviral Shrivastava Department of Computer Science and Engineering Arizona State University, Tempe, AZ 85281, USA {jongeun.lee, aviral.shrivastava}@asu.edu Abstract— For embedded systems where neither energy nor reliabil- ity can be easily sacrificed, we present an energy efficient soft error protection scheme for register files (RF). Unlike previous approaches, our method explicitly optimizes for energy efficiency and exploits the fundamental tradeoff between reliability and energy. While even sim- ple compiler-managed RF protection scheme is more energy efficient than hardware schemes, this work formulates and solves further com- piler optimization problems to significantly enhance the energy effi- ciency of RF protection schemes by an additional 24%. I. I NTRODUCTION Power density and reliability have risen to become the most im- portant design concerns in the sub-nanometer fabrication era. On one hand, power density has increased so much that we cannot op- erate processors at the maximum possible clock frequency deter- mined by design, on the other hand, the basic computational units, i.e., transistors have become extremely susceptible to soft errors. Even a slight variation in signal voltage, noise in the power sup- ply, or even cosmic particle strike can toggle the logic value of the transistor, eventually causing a system failure [1]. There is a clear need of techniques to mitigate the impact of soft errors at mini- mal power overhead. This need is aggravated by the fact that the soft error rate increases exponentially with temperature. Register file (RF) is most affected by both of these tightly coupled effects, since it is both the hottest component in the processor [2], and also extremely susceptible to soft errors [3]. The earliest forms of register file protection include ECC and parity checking [4]. However, error checking, especially that based on ECC, has a large overhead in terms of area, runtime, and en- ergy [5, 6]. While the latency of ECC operations can be hidden by parallelizing ECC with other operations, the area and energy overhead cannot be. To reduce the area overhead of RF protec- tion, later techniques only protect a part of the RF [7]. To further reduce overhead of ECC, later schemes simply replicate the reg- isters they intend to protect. Blome et al. [3] uses a small cache to store duplicates of recently accessed register values and thus a simple comparison on every read operation can detect errors in registers. Memik et al. [8] proposes a technique to replicate some of the register values in unused physical registers in the context of superscalar processors, where there are a number of physical registers and register binding is done at runtime. Another inter- * This work is partially supported by grants from Microsoft, Raytheon and Star- dust Foundation. esting variation on register replication is in-register replication [9], which exploits the fact typically a large fraction of register values are narrower than half the register width, or 16 bits. Such values can be replicated in the same register, requiring no significant extra hardware. All these microarchitectural techniques have persistent power overhead associated with error checking. Compiler techniques on the other hand promise very power-efficient RF protection, either on their own by instruction scheduling or register reallocation to shorten the live ranges of the variables stored in it [6], or by en- hancing the effectiveness of microarchitectural techniques of par- tial RF protection schemes [3, 8, 7] by making the decision of which registers to protect at compile-time. This work develops compile-time analysis that explicitly incor- porates RF power consumption to come up with register renam- ing that can be used in both, existing pure-compiler techniques, or to enhance the existing microarchitectural schemes to achieve to power efficient RF protection. Our experimental results on em- bedded application benchmarks from MiBench [12] indicate that even the simplest of such compiler-management schemes can be more energy efficient than hardware schemes. In addition, our ex- plicit optimizations can further increase the energy efficiency by 24% on average, as measured by our cost metric on register file reliability and energy overhead. II. BACKGROUND AND MOTIVATION We use vulnerability as the measure of reliability. Following the architectural vulnerability factor [10] the vulnerability of a register is defined as the combined lifetime (or the sum of the live range lengths) of variables assigned to it. The live range of a variable is from its definition until its last use and represents the time when useful data is present in the register. Any transient fault occurring to the register during that time period therefore destroys data in- tegrity and can manifest itself into an error. Thus given the same transient fault rate, vulnerability can be used to predict the soft er- ror rate. The vulnerability of a register file is simply the sum of vulnerability of all registers. Vulnerability is fully determined by the register access pattern, and can be controlled by changing the program. Researchers have found that not all registers are always vulner- able and thus significant vulnerability reduction can be achieved even with only K protected registers that is less than R, the num- ber of registers, or K entries of redundant information for pro-