Integrated Thermal Analysis for Processing In Die-Stacking Memory Yuxiong Zhu Borui Wang Dong Li Jishen Zhao University of California at Santa Cruz, University of of California at Merced {yzhu29, bwang27, jishen.zhao}@ucsc.edu, dli35@ucmerced.edu ABSTRACT Recent application and technology trends bring a renaissance of the processing-in-memory (PIM), which was envisioned decades ago. In particular, die-stacking and silicon inter- poser technologies enable the integration of memory, PIMs, and the host CPU in a single chip. Yet the integration sub- stantially increases system power density. This can impose substantial thermal challenges to the feasibility of such sys- tems. In this paper, we comprehensively study the thermal feasibility of integrated systems consisting of the host CPU, die-stacking DRAMs, and various types of PIMs. Compared with most previous thermal studies that only focus on the memory stack, we investigate the thermal distribution of the whole processor-memory system. Furthermore, we exam- ine the feasibility of various cooling solutions and feasible scale of various PIM designs under given thermal and area constraints. Finally, we demonstrate system run-time ther- mal feasibility by executing two high-performance comput- ing applications with PIM-based systems. Based on our ex- perimental studies, we reveal a set of thermal implications for PIM-based system design and configuration. CCS Concepts Computer systems organization Data flow architec- tures; Heterogeneous (hybrid) systems; Special purpose systems; Keywords Processing-in-memory; Die stacking; Interposer; Thermal; High-performance computing 1. INTRODUCTION Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MEMSYS 2016 October 3–6, 2016, Washington, DC, USA © 2016 ACM. ISBN 978-1-4503-4305-3. . . $15 DOI: http://dx.doi.org/10.1145/2989081.2989093 Processing-in-memory (PIM), also known as near-memory computing or near-data processing, builds on the basic idea of integrating computation directly in memory devices [31, 14, 17, 53, 37, 21, 11, 36, 50, 52]. After decades of dor- mancy, it re-emerges in a new form due to recent applica- tion and technology trends. On the application front, in- memory databases [57, 60], web-scale applications [47, 15], high-performance computing [54, 10, 30, 62], and in-situ data processing such as scientific visualization for real-time analysis [38] manipulate increasingly large volumes of data in memory. Data movement between CPU and memory is becoming one of major contributors to system energy con- sumption and performance degradation [64, 3, 46]. This mo- tivates the demand for moving computation close to mem- ory, where working data is located. On the technology front, recent advances of 3D-stacked memory [48, 12, 34, 67, 68, 9] enable the stacking of a logic (silicon) die implemented by a high-performance technology process with one or more memory (e.g., DRAM) layers. The logic die offers sufficient silicon area and performance capability to implement vari- ous logic and computation functions, such as adders, mem- ory copiers, CPU cores, and GPUs [13, 4]. Recent stud- ies [19, 1, 2, 32, 51, 33, 65, 20, 4, 13, 65, 33] demonstrate that such integration technologies is likely to enable PIM in a practical manner. One major concern in adopting PIMs with die-stacking memory is thermal feasibility. Prior studies demonstrated the thermal feasibility of integrating programmable PIMs with 3D-stacked memory [13]. However, most previous re- lated work only focuses on studying the thermal issues of PIM-based memory stack itself; the thermal feasibility of the integrated system – the host CPU and the memory stack – remains largely unknown. Die-stacking memories are typ- ically integrated with a host CPU on a silicon interposer [61, 35] in a single chip. This effectively reduces the footprint of processor-memory system, but also increases its density. As such, the integration of memory, PIMs, and the host CPU can significantly increase system power density and impede heat dissipation, reducing the thermal feasibility of PIM-based designs. Studying the memory stack alone is insufficient to understand the thermal feasibility of the integrated system. The thermal interaction between the integrated host CPU and the memory stack can intricate the thermal analysis. Host CPU heat dissipation can heavily impact the temperature of