Adaptive Line Placement with the Set Balancing Cache Dyer Rolán Basilio B. Fraguela Ramón Doallo Depto. de Electrónica e Sistemas Universidade da Coruña A Coruña, Spain {drolan, basilio, doallo}@udc.es ABSTRACT Efficient memory hierarchy design is critical due to the in- creasing gap between the speed of the processors and the memory. One of the sources of inefficiency in current caches is the non-uniform distribution of the memory accesses on the cache sets. Its consequence is that while some cache sets may have working sets that are far from fitting in them, other sets may be underutilized because their working set has fewer lines than the set. In this paper we present a tech- nique that aims to balance the pressure on the cache sets by detecting when it may be beneficial to associate sets, displac- ing lines from stressed sets to underutilized ones. This new technique, called Set Balancing Cache or SBC, achieved an average reduction of 13% in the miss rate of ten benchmarks from the SPEC CPU2006 suite, resulting in an average IPC improvement of 5%. Categories and Subject Descriptors: B.3.2 [Memory Structures]: Design Styles—cache memories General Terms: Design, Performance Keywords: cache, performance, adaptivity, balancing 1. INTRODUCTION Memory references are often not uniformly distributed across the sets of a set-associative cache, the most common design nowadays [14]. As a result, at a given point during the execution of a program there are usually sets whose working set is larger than their number of lines (the associativity of the cache), while the situation in other sets is exactly the op- posite. The outcome of this is that some sets exhibit large local miss ratios because they do not have the number of lines they need [9], while other sets achieve good local miss ratios at the expense of a poor usage of their lines, because some or many of them are actually not needed to keep the working set. An intuitive answer to this problem is to in- crease the associativity of the cache. Multiplying by the associativity is equivalent to merging sets in a single one, joining not only all their lines, but also their correspond- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. MICRO’09, December 12–16, 2009, New York, NY, USA. Copyright 2009 ACM 978-1-60558-798-1/09/12 ...$10.00. ing working sets. This allows to balance smaller working sets with larger ones, making available previous underuti- lized lines for the latter, which results in smaller miss rates. Unfortunately, increments in associativity impact negatively access latency and power consumption (e.g. more tags have to be read and compared in each access) as well as cache area, besides increasing the cost and complexity of the re- placement algorithm. Worse, progressive increments in the associativity provide diminishing returns in miss rate reduc- tion, as in general, the larger (and fewer) the sets are, the more similar or balanced their working sets tend to be. This way, only restricted levels of associativity are found in cur- rent caches. In this paper we propose an approach to associate cache sets whose working set does not seem to fit in them with sets whose working set fits, enabling the former to make use of the underutilized lines of the latter. Namely, this cache design, which we call Set Balancing Cache or SBC, shifts lines from sets with high local miss rates to sets with un- derutilized lines where they can be found later. Notice that while an increase in associativity equates to merging sets in an indiscriminate way, our approach only exploits jointly the resources of several sets when it seems to be beneficial. Also, increases in associativity cannot choose which sets to merge, while the SBC can be implemented using either a static policy, which also preestablishes which sets can be as- sociated, or a dynamic one that allows to associate a set with any other one. Thus, as we will see in the evaluation, the SBC achieves better performance than equivalent increases in associativity while not bringing their inconveniences. The rest of this paper is organized as follows. Next section will describe the algorithm and structure of a static SBC, in which sets can only be associated with other sets depending on a preset condition on their index. Section 3 will intro- duce a dynamic SBC that allows to shift lines from a set that presents a bad behavior to the best set available (i.e. not yet associated) in the cache. Both SBC proposals will be evaluated using the environment described in Section 4, the results being discussed in Section 5. The cost of both approaches will be examined in Section 6. A deeper analy- sis of the cost and performance of the SBC is presented in Section 7. Related work will be discussed and compared in Section 8. The last section is devoted to the conclusions and future work. 2. STATIC SET BALANCING CACHE We seek to reduce the pressure on the cache sets that are unable to hold all the lines in their working set, by displacing