Soft Error Tolerance using Horizontal-Vertical- Double-Bit Diagonal Parity Method Md. Shamimur Rahman, Muhammad Sheikh Sadi, Sakib Ahammed Dept. of Computer Science &Engineering Khulna University of Engineering & Technology Bangladesh e-mail:shamimur052@gmail.com, sheikhsadi@gmail.com, sakib.kuet10@gmail.com Jan Jurjens Dept. of Computer Science TU Dortmund Germany e-mail:jan.juerjens@isst.fraunhofer.de Abstract—The likelihood of soft errors increase with system complexity, reduction in operational voltages, exponential growth in transistors per chip, increases in clock frequencies and device shrinking. As the memory bit-cell area is condensed, single event upset that would have formerly despoiled only a single bit-cell are now proficient of upsetting multiple contiguous memory bit-cells per particle strike. While these error types are beyond the error handling capabilities of the frequently used error correction codes (ECCs) for single bit, the overhead associated with moving to more sophisticated codes for multi-bit errors is considered to be too costly. To address this issue, this paper presents a new approach to detect and correct multi-bit soft error by using Horizontal-Vertical-Double-Bit-Diagonal (HVDD) parity bits with a comparatively low overhead. Keywords—Soft Error Tolerance, Horizontal Parity, Vertical Parity, Double-Bit Diagonal Parity. I. INTRODUCTION The advent of new technologies for implementation, along with non-functional constraints (dependability, timeliness, power, cost and time-to-market), has seen the design of embedded systems become more challenging and complex [1]. The complexity of such systems is growing at a high pace through the integration of mixed-criticality applications (both safety-critical and non-safety-critical), through the high interaction of distributed application. Embedded systems engineering has evolved from designing single CPU systems (System-on-a-Chip (SoC) concept) to concurrent computing structures (Multi-Processor-System-on-a-Chip (MPSoC) and Network-on-a-Chip (NoC) concepts) often in the form of a distributed network [2], [3]. When designing high availability systems that are used in electronic-hostile environment, errors are a matter of great concern. Space programs, patient condition monitoring system in ICU where a system cannot afford a malfunction, are vulnerable to soft errors [4]. Nuclear power monitoring systems, where a single failure may cause severe destruction and real-time systems, where a missed deadline can constitute an erroneous action and a possible system failure, are a few other examples where soft error is a critical issue [5], [6], [7]. The impact of soft errors is such that action is needed to increase a system’s tolerance or to lower the risk of soft errors in the system. Prior research into soft errors has focused primarily on circuit level solutions, logic level solutions, spatial redundancy and temporal redundancy [8]. However, in all cases, the system is vulnerable to soft error problems in key areas. Further, in software-based approaches, the complex use of threads presents a difficult programming model. Hardware and software duplication suffers not only from overheads due to synchronizing duplicate threads, but also from inherent performance overheads due to additional hardware. Hardware- based protection techniques based on duplication often suffers from high area, time and power overheads [9], [10], [11]. Various types of error detection and correction codes are used in computers. For example, for satellite applications, Hamming code [12] and different types of parity codes are used to protect memory devices. Complex codes are not applied due to time constraints and limited resources on board [13]. There are other methods for error detection and correction such as parity codes, rectangular parity codes, BCH codes [14], N-Dimensional parity code [15], and Golay codes [16] whose error detection and correction rate varies from method to method. However, majority of these methods are still facing low error detection and correction rate and/or high information overhead. For this reason, there is a great need of further research to increase the error detection and correction rate with minimal overhead. In this paper, a high-level error detection and correction method to protect against soft errors is proposed. This method is based on parities for each row, column and double bit diagonal parity in backward slash directions. The HVDD method provides high error detection and correction rate that can correct up to 3 bit upsets with low bit overhead in a data block. The rest of this paper is organized as follows. We provide some related work in Section II. The proposed methodology is