Using Input-to-Output Masking for System-Level
Vulnerability Estimation in High-Performance Processors
Alireza Haghdoost
1
Hossein Asadi
1
Amirali Baniasadi
2
1
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
2
Department of Electrical and Computer Engineering, University of Victoria, Canada
haghdoost@ce.sharif.edu asadi@sharif.edu amirali@ece.uvic.ca
Abstract—In this paper, we enhance previously suggested
vulnerability estimation techniques by presenting a detailed
modeling technique based on Input-to-Output Masking (IOM).
Moreover we use our model to compute the System-level Vul-
nerability Factor (SVF) for data-path components in a high-
performance processor. As we show, recent suggested estimation
techniques overlook the issue of error masking, mainly focusing
on time periods in which an error could potentially propagate
in the system. In this work we show that this is incomplete as it
ignores the masking impact. Our results show that including the
IOM factor can significantly affect the system-level vulnerability
for data-path components. As a case study, we analyze the IOM
factor for CPUs with different configurations. Our results show
that the average variation of the IOM factor is less than 5%.
Meantime, the IOM factor varies between 24% to 76% for the
applications studied here. Accordingly we find the IOM factor to
be less configuration dependent and mainly workload dependent.
Index Terms—System Level-Vulnerability, Architectural Vul-
nerabulity Factor, High-Performance Processors, Fault Masking
Factor.
I. I NTRODUCTION
The data integrity of high-end and mainstream processors is
threatened by cosmic and terresial energetic particles such as
neutrons and alpha particles from packaging materials. These
energetic particles can change the state of storage elements
such as flip-flops and SRAM cells within processors and cause
a transient error. The radiation-induced transient errors, also
called soft errors, occur more often than hard errors in the
current VLSI technology [1], [2]. Recent research study has
shown that soft errors could have significant impact on the
data integrity of the current microprocessor technology [3].
As technology continues to scale down and the number of
transistors per chip continues to move up, the soft error rate
per chip is expected to increase for the next several years [1].
Accordingly, designers would need to incorporate aggressive
protection techniques in future microprocessor designs. An
important aspect of designing cost-effective protection tech-
niques is developing accurate soft error vulnerability models
for individual components. This will help understanding the
extend of vulnerability for data-path components such as
cache, register files, and load/store queues before developing
protection techniques. Having an accurate model for such
components would facilitate making informed decisions about
the level of protection needed across data-path components
and target workloads. The right protection level for data-path
structures reduces data loss probability and therefore would
increase system reliability.
Recent field study over several thousands of systems indi-
cates that in the current processor technology, a majority of
system reboots are initiated by single event upsets (or SEUs)
occurring in data-path components such as cache and register
files [3]. Errors in such structures can easily propagate to
the system outputs and can significantly reduce the system
reliability. In particular, cache reliability comes with high
importance as errors occurring in the data cache can propagate
to higher memory levels, and can easily lead to data integrity
issues [4], [5]. While designing caches with low access time
and miss rate is an important goal, maintaining low power dis-
sipation and high reliability have also become necessary. This
is particularly true for high-end and mainstream processors
where reliability has always been a vital concern.
Previous studies have introduced analytical models to com-
pute vulnerability of data-path components such as cache and
register file to SEUs [6], [7], [8], [9]. Such models often
provide fast estimation but suffer from inaccuracies as the
system-level impact of soft errors are not taken into account in
these models. More accurate measurements, i.e., fault injection
(FI) strategies [10], [11], [12], [13], are both time-consuming,
due to the large number of runs, and still prone to inaccuracy,
due to the limited number of addresses targeted.
The goal of this study is to introduce a new vulnerability es-
timation technique to improve accuracy of previous estimation
methods and maintain low estimation time. We do so by taking
into account an important parameter ignored by earlier studies.
Previous studies mainly rely on measuring the time period in
which an error occurring in a data block could potentially
propagate in the system, also referred to as the critical time,
to estimate vulnerability. While critical time is an important
factor, it is not the only one.
In this work, we present a modeling technique based on
the Input-to-Output Masking (IOM) factor. We define the IOM
factor of a component as the percentage of errors masked when
propagating erroneous values from the inputs to the outputs
of the component. We present a technique to compute the
IOM factor of components for a high-performance processor.
Using the IOM factor, we also present a modeling technique
to estimate the Component-level Vulnerability Factor (CVF)
and the System-level Vulnerability Factor (SVF) of the data-
path components of a high-performance processor. We define
91 978-1-4244-6268-8/10/$26.00 ©2010 IEEE