2742 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 60, NO. 4, AUGUST 2013 Radiation and Fault Injection Testing of a Fine-Grained Error Detection Technique for FPGAs Gabriel L. Nazar, Paolo Rech, Christopher Frost, and Luigi Carro Abstract—We present the experimental evaluation of a ne-grained hardening approach that exploits underused and abundant resources found in state-of-the-art SRAM-based FPGAs to detect radiation-induced errors on conguration memories. The technique’s main goal is to provide the benets of ne-grained re- dundancy, namely improved diagnosis and reduced error latency, with a reduced area overhead. Neutron experiments, validated with fault injection campaigns, demonstrate the proposed tech- nique’s efciency when compared to the traditional dual modular redundancy. Index Terms—Fault tolerance, eld-programmable gate arrays (FPGAs), neutron radiation effects. I. INTRODUCTION F IELD -Programmable Gate Arrays (FPGAs) have seen great success over the past years due to their high perfor- mance, exibility and fast time-to-market. Moreover, the pos- sibility of reprogramming the device after deployment allows the addition of new functionalities or the correction of design bugs, extending the system’s lifetime. Despite these advantages, FPGA utilization in critical systems has been limited due to re- liability issues. With the aggressive scaling of transistor feature sizes, radiation-induced Single Event Effects (SEEs) became a major threat to the reliability of electronic devices. While this concern was more prominent in radiation harsh environments, such as the space, recent technologies may suffer from SEEs even in terrestrial applications [1]. It is then crucial to exper- imentally characterize the susceptibility of a device to these effects. As SRAM-based FPGAs have their functionality stored in large memory arrays, which represent the vast majority of the storage cells in the device, Single Event Upsets (SEUs) or Multiple Bit Upsets (MBUs) affecting conguration cells are a major concern for the overall system reliability. Evaluating the effects of such faults in FPGAs is, hence, crucial to enable their use in critical systems. The two main means to do so are through fault injection and accelerated radiation experiments, which are often complementary approaches. The rst is able Manuscript received September 28, 2012; revised February 04, 2013; ac- cepted April 30, 2013. Date of publication May 31, 2013; date of current ver- sion August 14, 2013. This work was supported by the CAPES foundation of the Ministry of Education, the CNPq research council of the Ministry of Science and Technology, and the FAPERGS research agency of the State of Rio Grande do Sul, Brazil. Experiments were performed in ISIS, Rutherford Appleton Labora- tories, Didcot, U.K., and founded by Science and Technology Facilities Council. G. L. Nazar, P. Rech, and L. Carro are with the Instituto de Informática, Uni- versidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS 91509-900, Brazil (e-mail: glnazar@inf.ufrgs.br; prech@inf.ufrgs.br; carro@inf.ufrgs.br). C. Frost is with the ISIS, Rutherford Appleton Laboratories, Didcot OX11 0QX, U.K. (e-mail: christopher.frost@stfc.ac.uk). Digital Object Identier 10.1109/TNS.2013.2261319 to inject a much larger number of faults in a short period of time, while the second allows a more accurate evaluation of the effects of radiation on the device. Fault injection can be performed with various abstraction levels, from high-level models down to an actual silicon device. Specically for FPGAs, due to the complex and frequently unpredictable effects of congurations that were not foreseen by manufacturers, the use of high level models or simulation software can become too complex or inaccurate. Furthermore, detailed low level schematics of the device are usually not available to users, increasing the complexity of assessing the purpose of each conguration bit. Thus, fault injection is usually performed directly in an actual FPGA, with different approaches [2]–[7]. Injecting faults directly into the FPGA has the additional benet of greatly reducing the total experiment time, as the circuit under test runs at full speed. However, although fault injection experiments are suitable to quickly determine relevant metrics such as fault coverage, they are unable to directly measure properties as cross-section or failure rate, as no physical disturbance is suffered by the de- vice. Thus, radiation experiments are relevant to measure the actual susceptibility of a device to such effects. In this work, we report the results of neutron experiments conducted to esti- mate the cross-section and failure rates attainable with a ne- grained error detection technique, relative to those of a tradi- tional coarse-grained approach. These experiments are valuable to validate the fault injection campaigns conducted on circuits with the technique [8], [9]. Fine-grained redundancy techniques as a means to mitigate transient faults in FPGAs have been proposed in several works [8]–[14]. Among the main advantages of such techniques are the ability to quickly detect faults and the improved diagnosis infor- mation provided by ne-grained comparators, as the output of each Lookup Table (LUT) can be compared to a replica. These features have a great potential to reduce the repair time of faults affecting conguration bits. As such faults are usually removed by means of scrubbing [15], the time required to traverse the conguration memory limits the attainable repair time, which is usually in the order of several milliseconds and may be too long for critical real time applications. Furthermore, once the error has propagated to sensitive parts of the user circuit, such as feedback structures, even its removal from the conguration may not restore the circuit functionality [11]. The main disadvantage of ne-grained techniques is the area cost of the additional voters or comparators. In this work we use a technique that exploits typically a very abundant and under- used resource of FPGAs, namely the carry propagation chain, to implement ne-grained comparators [8], [9]. This circuit is al- ready included in the basic congurable blocks of most state-of- 0018-9499 © 2013 IEEE