Dynamic Associative Caches: Reducing Dynamic Energy of First Level Caches Karthikeyan Dayalan, Meltem Ozsoy, Dmitry Ponomarev State University of New York at Binghamton Email: kdayala1,mozsoy,dima@cs.binghamton.edu Abstract—We propose Dynamic Associative Cache (DAC) - a low complexity design to improve the energy-efficiency of the data caches with negligible performance overhead. The key idea of DAC is to perform dynamic adaptation of cache associativity - switching the cache operation between direct-mapped and set- associative regimes - during the program execution. To monitor the program needs in terms of cache associativity, the DAC design employs a subset of shadow tags: when the main cache operates in the set-associative mode, the shadow tags operate in the direct- mapped mode and vice versa. The difference in the hit rates between the main tags and the shadow tags is used as an indicator for the cache mode switching. We show that DAC performs most of its accesses in the direct-mapped mode resulting in significant energy savings, at the same time maintaining performance close to that of set-associative L1 D-cache. I. I NTRODUCTION First-level caches consume a significant amount of energy in modern processors. According to recent reports [13], [11], be- tween 12% and 45% of the processor core energy is attributed to first level caches. Most processors today use set-associative first-level caches to increase the cache hit rates. In an N-way cache, all N tag and data arrays are accessed in parallel, but only one of the ways return the requested data on a cache hit - the rest of the ways are accessed in vain resulting in energy wastage. Serializing tag and data access and only reading the data from a matching way is impractical for first-level caches, as it increases the critical timing path or requires an additional pipeline stage for the cache access, thus affecting performance. In this paper, we propose Dynamic Associative Cache (DAC) - a technique that dynamically adjusts the cache as- sociativity (between direct-mapped and set-associative mode) in response to program demands. DAC is based on the key observation that only a relatively small number of programs substantially benefit from a higher cache associativity, but the majority of programs perform well even with direct-mapped caches. Without compromising the cache size, the DAC design implements direct-mapped style access to a set-associative cache by using a few least significant bits of the tag to explicitly select a cache way to be accessed - the access to all other ways is disabled, thus saving energy. In a sense, the combination of the index bits and least significant tag bits serve as an index to the direct-mapped cache. When DAC operates in a set-associative mode, the regular access to all cache ways is performed. The goal of DAC is to minimize the number of accesses that are performed in the set-associative mode (thus saving energy) without sacrificing performance. This is achieved by carefully controlling the mode transitions and performing these transitions based on the triggers generated by the cache performance monitoring unit (CPMU). CPMU is a simple logic that employs additional shadow tags [16] to keep track of how the cache would have performed if it was operating in the other mode. For example, during the time when the cache is operating in the set-associative mode, the shadow tags track its hypothetical performance in the direct-mapped mode. Conversely, if the cache operates in the direct-mapped mode, the shadow tags track the performance of a hypothetical set- associative cache. The CPMU logic measures the difference in the hit rate between the actual and the hypothetical modes and performs mode transitions as necessary, when certain thresholds are crossed. Compared to a simple use of cache misses as the mode switch trigger, such shadow tag monitoring allows to distinguish conflict misses from the capacity misses. Periodically, all monitoring counters are reset. In order to reduce the design complexity due to the shadow tags, we utilize the idea of set sampling [8], where only a small subset of the cache sets is monitored through the shadow tags. Access to all other sets, not covered by the selected shadow sets are not accounted for making transition decisions. A transition from direct-mapped to set-associative access mode in DAC is simple and does not require any additional actions. This is because all data brought into the cache while it was operating in the direct-mapped mode will be found during the set-associative search of the tags. However, the reverse transition from set-associative to direct-mapped mode is more challenging. Indeed, the direct-mapped search would be limited to only one way, and therefore some cache hits can be erroneously identified as misses. Furthermore, it can open a possibility of having duplicate data in the cache, once the erroneous miss is serviced. To avoid these problems, several solutions are possible. For a write-through cache, the entire cache can be simply invalidated when a transition from set- associative to direct-mapped mode is made. For a write-back cache, the mode transition would require the write-back of all dirty lines to the L2 cache. Since the transition from set-associative to direct-mapped mode is more complex, we propose two variants of DAC: DAC-Budget, and DAC-Deluxe. DAC-Budget implementation only supports the transition from direct-mapped to set- associative mode and operates in the set-associative mode