208 IEEE TRANSACTIONS ON DEVICE AND MATERIALS RELIABILITY, VOL. 10, NO. 2, JUNE 2010
Robust Register Caching: An Energy-Efficient
Circuit-Level Technique to Combat Soft
Errors in Embedded Processors
Mahdi Fazeli, Student Member, IEEE, Alireza Namazi, and Seyed Ghassem Miremadi, Senior Member, IEEE
Abstract—This paper presents a cost-efficient technique to
jointly use circuit- and architecture-level techniques to protect an
embedded processor’s register file against soft errors. The basic
idea behind the proposed technique is robust register caching
(RRC), which creates a cache of the most vulnerable registers
within the register file in a small and highly robust cache memory
built from circuit-level single-event-upset-protected memory cells.
To guarantee that the most vulnerable registers are always stored
in the robust register cache, the average number of read operations
during a register’s lifetime is used as a metric to guide the cache
replacement policy. A register is vulnerable to soft errors when it
holds a value that will be used in subsequent cycles. Consequently,
while a register value is stored in the register cache, it is robust
against single- and multiple-bit upsets. To minimize the power
overhead of the RRC, the clock-gating technique is efficiently ex-
ploited by the main register file, resulting in significantly reduced
power consumption. The RRC was experimentally evaluated using
the LEON processor for two benchmarks, namely, the MiBench
embedded benchmark suite and the SPEC CPU2006 general-
purpose benchmark. Our experimental results show that if the
cache size is selected appropriately, the architectural vulnerability
factor (AVF) of the register file is significantly reduced while also of-
fering the benefits of low power, area, and performance overheads.
Index Terms—Embedded processors, fault tolerance, multiple-
bit upsets (MBUs), register file, single-event upset (SEU).
I. I NTRODUCTION
H
IGH-ENERGY particle strikes have severely challenged
the reliability of today’s embedded processors. Until
recently, single-bit upsets (SBUs) in memory elements were
thought to be the main effect of particle strikes. However, as
technology shrinks to the nanometer scale, multiple-bit upsets
(MBUs) are causing serious reliability problems [1]–[5]. An
MBU is defined as several adjacent bit upsets simultaneously
caused by a single particle strike. In [1], it was reported that
up to five bit upsets may occur in a 130-nm SRAM as a result
of neutron particle collisions. In [6], it was estimated that for
a commercial 130-nm SRAM, 7% of all adverse events are
MBUs, of which 92% are double bit upsets and 8% are triple bit
upsets. These results indicate that MBUs should be taken into
account when designing reliable systems.
Manuscript received August 22, 2008; revised November 28, 2009; accepted
January 8, 2010. Date of publication February 5, 2010; date of current version
June 4, 2010. An earlier version of this paper was presented at the Proceed-
ings of the 39th Annual IEEE/IFIP International Conference on Dependable
Systems and Networks (DSN 2009) [38].
The authors are with the Department of Computer Engineering, Sharif
University of Technology, Tehran 11365-9363, Iran (e-mail: m_fazeli@ce.
sharif.edu; namazi@ce.sharif.edu; miremadi@sharif.edu).
Digital Object Identifier 10.1109/TDMR.2010.2041234
The register file is a critical part of an embedded processor
from a reliability perspective [7], [8]. This is because the
register file holds useful data that are frequently accessed by
the processor. This implies that an error in the register will most
likely propagate to other parts of the processor. Several error-
handling techniques have been proposed, which fall under two
general categories. The first category contains techniques that
are based on detection and recovery mechanisms, such as the
use of parity bits and rollback recovery. Although these tech-
niques are efficient in terms of power usage, they may violate
the real-time requirements of embedded applications if recov-
ery happens after certain task deadlines. The other category
contains techniques that are based on error-correction codes
(ECCs) or error-masking techniques such as TMR [9]–[11].
Protecting the entire register file using ECCs or error-masking
techniques is not appropriate for embedded applications due
to the high power consumption overhead. On the other hand,
ECCs that can detect and correct MBUs require a larger number
of check bits (or ECC bits) and therefore impose larger area and
power overheads [12]. Moreover, reading and calculating ECC
bits during each read operation can impose performance and
power consumption overheads. The scenario gets even worse
for register files with multiple read ports. To address this issue,
an SEU-tolerant technique that protects only the most vulnera-
ble registers of the register file with an SECDED code is pro-
posed in [7] and [8]. This technique is based on the fact that not
all of the registers are equally vulnerable to soft errors. Tech-
niques that utilize the architectural vulnerability characteristics
of the register file [7], [8], [13] are regarded as architecture-
level techniques in this paper. However, even though the afore-
mentioned technique offers lower power consumption overhead
than that associated with protecting the entire register file with
an SECDED code, it cannot efficiently cope with MBUs.
One effective way to combat MBUs in the register file is
to use circuit-level-protected SEU-tolerant memory cells in the
register file [11], [14]–[16]. This is because if a register file is
built up by such memory cells, as each memory cell is protected
against particle strikes, the register file becomes highly robust
against multiple particle strikes, i.e., MBUs. However, this
approach imposes rather high power and area overheads if it
is used to protect the entire register file.
Although different architecture-level SEU-tolerant tech-
niques based on ECCs or detection and recovery mechanisms
have been proposed to protect register files, there are few
studies that cost-efficiently integrate circuit- and architecture-
level SEU-tolerant techniques such that they complement each
1530-4388/$26.00 © 2010 IEEE