Run Time Management of Faulty Data Caches Michail Mavropoulos Dept. of Computer Eng. & Informatics University of Patras Patras, Greece mavropoulo@ceid.upatras.gr Georgios Keramidas Dept. of Informatics Aristotel University Thessaloniki, Greece gkeramidas@csd.auth.gr Dimitris Nikolos Dept. of Computer Eng. & Informatics University of Patras Patras, Greece nikolosd@ceid.upatras.gr Abstract—As the technology continuous to shrink, power consumption appears to be the main design parameter. Operation on low voltage negatively affects mainly the operation of on-chip memories, resulting in multiple malfunctioning memory cells. As a reaction many cache fault tolerance (CFT) mechanisms have been proposed targeting the mitigation of performance degradation. The challenge is to devise mechanisms that are tailored to the memory access patterns of the executing applications. In this work we initially investigate the impact of the granularity of cache line disabling scheme in the first level data caches. Based on our analysis, we propose a run time adaptive mechanism that is able to opt the cache (sub-)block taking into account the diverse memory characteristics of the application. The proposed mechanism is based on the widely used block (sub-block) disabling scheme, and dynamically selects the appropriate sub-block granularity during the execution of the applications. Our evaluation results reveal that the proposed dynamic approach is able to offer significant benefits over a faulty cache design with a monolithic (sub-)block granularity. Keywords— Cache Fault Tolerance, Re-configurable caches I. INTRODUCTION Reducing the supply voltage in today’s process technologies introduces significant reliability challenges for on-chip SRAM arrays. This is particularly true as silicon industry moves into the near threshold region characterized by high fault probabilities [7]. On-chip caches are built with minimum sized (to reduce leakage power), thus more prone to failure SRAM cells [3]. Resilience roadmaps pinpoint the vulnerability problem in SRAM cells [16]. As a result, a vast portion of the on-chip memory resources will become unreliable leading to stochastic designs due to increases in static [4] and dynamic [5] variations, wear-out failures [24], and manufacturing defects [3]. Therefore, it becomes critical to investigate new CFT techniques [13][14][17][22]. Obviously, these techniques have to be both lightweight and performance effective, especially when the target caches are close to the core (e.g., L1 caches). A broad category of CFT designs, named after the term graceful degradation [18], has gained much attention by the researchers as well as the industry. The underlying idea is to disable cache portions, such as cache ways, that include malfunctioning memory cells, and apply several schemes to reduce the consequences of the disabled cache portions [2][13][17][22][23]. A detailed analysis about the already proposed techniques is presented in Section III. A particularly attractive, due to its simplicity, scheme was presented in [1]. The authors introduced the concept of subblock disabling. Instead of relying on complex cache restructuring or block remapping approaches, the cache lines are divided into four parts (called subblocks) and a separate bit (called fault bit) is assigned to each subblock. Sub-block disabling (called SBDIS hereafter) allows keeping data in cache lines even if they have some faulty subblocks. By tracking which subblocks are not faulty, hits in those subblocks can be detected. Obviously SBDIS is a low overhead CFT technique (with less than 0.19% area overheads) [1]. Figure 1 shows the impact of the granularity of the (S)BDIS scheme to the cache fault-free area (y-axis) for five percentages of failures (pfails) assuming a 32KB, 64-bytes block, 8-way cache. In BDIS, one fault bit is applied to the whole cache frame (1 fault bit per 64B). In SBDIS2/SBDIS4/SBDIS8, one fault bit is assigned to every 32/16/8B sub-block respectively. From Figure 1, it is obvious that moving to smaller block granularities, a larger effective cache capacity is available at the microarchitectural level and this trend is more pronounced as we shift to higher pfails. For example, in the 5e-04 pfail (five malfunctioning cells per 10 4 memory cells), BDIS results in 75% fault-free cache area (36% in 2e-03), while the SBDIS8 scheme manages to increase the sound area to 96% (85% in 2e-03). The main contributions of this work are: We examine the cache behavior for a wide range of pfails and for two benchmark suites by applying the BDIS technique in different block granularities. Our analysis reveals that there is no unique BDIS granularity that performs the best across all pfails, fault maps, and studied benchmarks. Based on the above observation we propose a simple mechanism that is able to opt the best performing cache block granularity according to the application memory behavior. The proposed mechanism decides at run-time if there is an opportunity to reduce the number of misses and dynamically adjusts the BDIS granularity. Note that the SBDIS scheme in [1] relies only on four subblocks (statically defined). An inherent drawback of the proposed mechanism is that it relies on cache flushes in order to select the appropriate (S)BDIS scheme. To address this, we introduce an additional Figure 1: Effective (fault-free) cache size for different percentages of malfunctioning cells.