Technical Report, Department of Electrical and Computer Engineering, Iowa State University, Iowa, USA MANAGER: A Multicore Shared Cache Energy Saving Technique for QoS Systems Sparsh Mittal and Zhao Zhang Department of Electrical and Computer Engineering Iowa State University, Ames, Iowa 50011, USA Email: sparsh0mittal@gmail.com,zzhang@iastate.edu Abstract Last level caches (LLCs) contribute significantly to processor power consumption. Saving LLC energy in multicore QoS systems is especially challenging, since aggressive energy saving techniques may lead to failure in providing QoS. We present MANAGER, a m ulticore shared ca che en ergy sa ving technique for quality-of-ser vice systems. Using dynamic profiling, MANAGER periodically predicts cache access activity for different configurations. Then, cache is partitioned among running programs to fulfill the QoS requirement while saving memory subsystem (LLC+DRAM) energy. Out-of-order simulations performed using dual-core workloads from SPEC2006 suite show that for 4MB LLC, MANAGER saves 13.5% memory subsystem energy, over a statically, equally-partitioned baseline cache. I. I NTRODUCTION In recent years, energy efficiency has emerged as the fundamental bottleneck in scaling processor performance. Moreover, cache energy consumption is becoming a significant fraction of processor power consumption [1, 2]. Several recent trends motivate this shift. Several e-learning and multimedia applications present high QoS and performance demands [3]. Since LLC is the last line of defense against the memory wall and QoS (quality of service) is crucially affected by the behavior of shared LLC [4, 5], modern processors use large LLC. With each CMOS technology generation, leakage energy consumption has been drastically increasing and thus, energy consumption of large LLCs is on rise. Hence, effective management of LLC in multicore processors is important for achieving both QoS and energy efficiency. The existing cache energy saving techniques have several limitations. Some techniques aim to aggressively save energy and may lead to large performance loss [6, 7] and hence, for QoS systems they may lead to failure in meeting the QoS requirements. Further, modern multicore processors may run arbitrary combinations of benchmarks and This work is supported in part by the National Science Foundation under grants CNS-0834476 and CNS-1117604. 1