Integrated Thermal Analysis for Processing In
Die-Stacking Memory
Yuxiong Zhu Borui Wang Dong Li
‡
Jishen Zhao
University of California at Santa Cruz,
‡
University of of California at Merced
{yzhu29, bwang27, jishen.zhao}@ucsc.edu, dli35@ucmerced.edu
ABSTRACT
Recent application and technology trends bring a renaissance
of the processing-in-memory (PIM), which was envisioned
decades ago. In particular, die-stacking and silicon inter-
poser technologies enable the integration of memory, PIMs,
and the host CPU in a single chip. Yet the integration sub-
stantially increases system power density. This can impose
substantial thermal challenges to the feasibility of such sys-
tems. In this paper, we comprehensively study the thermal
feasibility of integrated systems consisting of the host CPU,
die-stacking DRAMs, and various types of PIMs. Compared
with most previous thermal studies that only focus on the
memory stack, we investigate the thermal distribution of the
whole processor-memory system. Furthermore, we exam-
ine the feasibility of various cooling solutions and feasible
scale of various PIM designs under given thermal and area
constraints. Finally, we demonstrate system run-time ther-
mal feasibility by executing two high-performance comput-
ing applications with PIM-based systems. Based on our ex-
perimental studies, we reveal a set of thermal implications
for PIM-based system design and configuration.
CCS Concepts
•Computer systems organization → Data flow architec-
tures; Heterogeneous (hybrid) systems; Special purpose
systems;
Keywords
Processing-in-memory; Die stacking; Interposer; Thermal;
High-performance computing
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice
and the full citation on the first page. Copyrights for components of this work
owned by others than ACM must be honored. Abstracting with credit is per-
mitted. To copy otherwise, or republish, to post on servers or to redistribute to
lists, requires prior specific permission and/or a fee. Request permissions from
permissions@acm.org.
MEMSYS 2016 October 3–6, 2016, Washington, DC, USA
© 2016 ACM. ISBN 978-1-4503-4305-3. . . $15
DOI: http://dx.doi.org/10.1145/2989081.2989093
Processing-in-memory (PIM), also known as near-memory
computing or near-data processing, builds on the basic idea
of integrating computation directly in memory devices [31,
14, 17, 53, 37, 21, 11, 36, 50, 52]. After decades of dor-
mancy, it re-emerges in a new form due to recent applica-
tion and technology trends. On the application front, in-
memory databases [57, 60], web-scale applications [47, 15],
high-performance computing [54, 10, 30, 62], and in-situ
data processing such as scientific visualization for real-time
analysis [38] manipulate increasingly large volumes of data
in memory. Data movement between CPU and memory is
becoming one of major contributors to system energy con-
sumption and performance degradation [64, 3, 46]. This mo-
tivates the demand for moving computation close to mem-
ory, where working data is located. On the technology front,
recent advances of 3D-stacked memory [48, 12, 34, 67, 68,
9] enable the stacking of a logic (silicon) die implemented
by a high-performance technology process with one or more
memory (e.g., DRAM) layers. The logic die offers sufficient
silicon area and performance capability to implement vari-
ous logic and computation functions, such as adders, mem-
ory copiers, CPU cores, and GPUs [13, 4]. Recent stud-
ies [19, 1, 2, 32, 51, 33, 65, 20, 4, 13, 65, 33] demonstrate
that such integration technologies is likely to enable PIM in
a practical manner.
One major concern in adopting PIMs with die-stacking
memory is thermal feasibility. Prior studies demonstrated
the thermal feasibility of integrating programmable PIMs
with 3D-stacked memory [13]. However, most previous re-
lated work only focuses on studying the thermal issues of
PIM-based memory stack itself; the thermal feasibility of
the integrated system – the host CPU and the memory stack
– remains largely unknown. Die-stacking memories are typ-
ically integrated with a host CPU on a silicon interposer [61,
35] in a single chip. This effectively reduces the footprint of
processor-memory system, but also increases its density. As
such, the integration of memory, PIMs, and the host CPU can
significantly increase system power density and impede heat
dissipation, reducing the thermal feasibility of PIM-based
designs. Studying the memory stack alone is insufficient to
understand the thermal feasibility of the integrated system.
The thermal interaction between the integrated host CPU
and the memory stack can intricate the thermal analysis. Host
CPU heat dissipation can heavily impact the temperature of