978-1-4799-5944-0/14/$31.00 c 2014 IEEE Context-Aware Resources Placement for SRAM-based FPGA to minimize Checkpoint/Recovery overhead Sahraoui Fouad ∗ , Fakhreddine Ghaffari ∗ , Mohamed El Amine Benkhelifa ∗ and Bertrand Granado ‡ ∗ ETIS, CNRS UMR 8051, ENSEA, Universit´ e Cergy-Pontoise; FRANCE ‡ LIP6, UPMC, CNRS UMR 7606; FRANCE Abstract—Existing SRAM-based Field Programmable Gate Arrays (FPGAs) are very sensitive to Single Event Effects (SEE) phenomena in harsh environments. To protect applications running on SRAM-based FPGAs from SEE, those applications mainly relay on resources redundancy approaches, which involve significant resources overhead. New proposed fault mitigation approaches use Partial Dynamic Reconfiguration to overcome such huge overhead of redundancy methods. In [1] a Backward Error Recovery (BER) approach based on Partial Dynamic Reconfiguration (PDR) is proposed. Nevertheless, such approach suffers greatly from time latency issue. In this paper, we introduce a new context-aware resources placement strategy to minimize the time overhead induced by the BER fault mitigation approach. Both of checkpoint and recovery overhead are evaluated with and without our context-aware resources placement strategy. A reduction of up to 71% of context frame is reported. Keywords: Fault Tolerance, SRAM-based FPGA, Reliabil- ity, Resources Placement, Backward Error Recovery. I. I NTRODUCTION Over the past years, SRAM-based Field Programmable Gate Arrays (FPGA) showed a significant progress on ad- vancing from prototyping platforms to execution platforms. This progress mainly came from their attractive features, such as high resources availability, high speed of execution and complete/partial reconfiguration capability. The use of SRAM technology to store the configuration data, called bitstream, was also a key feature for this advance. Bitstream configures functional elements such as Look-Up Table (LUT), Flip- Flop (FF), Block RAM (BRAM), Digital Signal Processor (DSP) and the routing wires/matrices that link those elements together. Despite this progress, SRAM-based FPGAs are still very sensitive to fault occurrences and this can be a limitation to their widespread use in harsh environments and critical- safety domains such as aerospace and avionics. This weakness can be especially important at configuration memory which represents more than 80% of the total memory inside FPGA (the rest 20% is BRAMs and FFs), and can lead to erroneous executions and wrong behaviors of the system. When integrated circuits, such as FPGAs, are exposed to cosmic rays or energetic particles, a certain perturbations of their electrical behavior may occur. Those perturbations are known as Single Event Effects (SEE) and can lead to different fault models [2]. The most likely fault models in FPGAs are Single-Event Upsets (SEUs) and Multiple-Bit Upsets (MBUs). To cope with this weakness, many Fault Tolerant (FT) meth- ods have been proposed to enhance the reliability of FPGA- based systems [3]. A great number of those methods are based on redundancy, such as Triple Modular Redundancy (TMR) or Duplication with Comparison (DWC). When redundancy methods are used, more resources are needed to mask or detect faults. The overhead varies from 200% to 400% compared to the initial system [4], in addition to the overhead introduced by voters needed to validate intermediate/final results. Research works in [5]–[7] tend to reduce this added over- head by finding a compromise between performance of the hardened system and the resource overhead based on other approaches. However, because redundancy methods mask only faults, they suffer from some weakness such as faults accu- mulation in replicas [4], MBUs affecting multiple replicas at the same time or faults occurring in voters [8]. Redundancy-alternative methods try to take advantage of Partial Dynamic Reconfiguration (PDR) feature, like Configu- ration Scrubbing [9], where a golden bitstream is periodically written to the SRAM memory to eliminate any eventual bit upset occurrence. Scrubbing approach is generally combined with Error Detection and Correction Codes (EDAC), the latter can provide a solution to minimize read/write overhead by localizing and correcting only the erroneous part of the bitstream. Although those alternative methods enhance the application reliability on SRAM-based FPGA, they are inappropriate to a certain type of applications where both bitstream and context must be corrected to avoid fault propagation, such applications can be for example Evolutionary Algorithms (EA). Formally, for any application which is more effective and provides better results upon time execution (generation/iteration) needs to be protected with methods that take into account its behavior of evolution. Recent researches [10]–[13] show that FPGAs- based platforms are becoming more and more interesting to be used as execution platforms for those types of applications within uncertain and harsh environments. In the aim to protect those types of applications against transient faults occurring into configuration layer of SRAM- based FPGA, we propose the use of Backward Error Recovery (BER) as a fault mitigation mechanism [1]. We continually verify the correctness of system configuration and save several