Improving Data Cache Performance using
Persistence Selective Caching
Sumeet S. Kumar, Rene van Leuken
Circuits and Systems Group, Faculty of EEMCS,
Delft University of Technology, The Netherlands
{s.s.kumar, t.g.r.m.vanleuken}@tudelft.nl
Abstract—This paper presents Persistence Selective Caching
(PSC), a selective caching scheme that tracks the reusability of L1
data cache (L1D) lines at runtime, and moves lines with sufficient
potential for reuse to a low-latency, low-energy assist cache from
where subsequent references to them are serviced. The selectivity
of PSC is configurable, and can be adjusted to suit the varying
memory access characteristics of different applications, unlike
existing schemes. By effectively identifying reusable cache lines
and storing them in the assist, PSC reduces average memory
access time by upto 59% as compared to competing schemes and
conventional data caches. Furthermore, by ensuring that only
reusable lines are cached by the assist, PSC reduces cache line
movements, and thus decreases average energy per access by upto
75% over other assists.
Keywords—cache memory, memory architecture, memory man-
agement, microprocessors
I. I NTRODUCTION
The limited size of processor caches in comparison to
the data set of modern applications leads to the emergence
of expensive misses, necessitating high latency and energy
consuming accesses to lower levels of the memory hierarchy.
Although large set-associative caches appreciably reduce miss
rates, their size causes them to have a higher hit-latency, and
consume more energy per access than smaller direct-mapped
caches.
This paper presents the Persistence Selective Caching
(PSC) scheme which reduces average memory access time
(AMAT) through the selective caching of reusable lines in a
small, fully-associative assist cache. The reuse potential of a
line is estimated at runtime based on its access persistence, i.e.
the number of accesses to the line within a certain window of
data references by the processor. Lines with sufficient access
persistence are moved from the L1 data cache (L1D) into the
assist cache from where subsequent references to them are
serviced. Due to the assist’s small size, these references only
incur a short access latency, and consume considerably lesser
energy than an L1D access. PSC’s selectivity ensures that only
the most reusable lines are moved to the assist, leading to a
significant reduction in the number of cache line movements
(swaps), and thus lower energy per access than competing
schemes. The significant contributions of this paper are:
• A configurable scheme that tracks the access persis-
tence of cache lines at runtime, and selectively caches
those with sufficient persistence in a low-latency,
This research was supported in part by the CATRENE programme under
the Computing Fabric for High Performance Applications (COBRA) project
CA104.
low-energy assist cache. The selectivity of PSC is
configurable, and allows the scheme to be adjusted
to suit the varying memory access characteristics of
different applications, unlike existing schemes.
• Illustration of the performance and energy benefits of
selective assist caching. PSC reduced AMAT by upto
59% and average energy per access by upto 75% as
compared to conventional data caches and competing
assists [1][2].
This paper is organized as follows: In Section II, we review
the state of the art in cache assists, and outline the motivation
for PSC. In Section III, we describe the architecture and
algorithms of PSC, and in Section IV, evaluate its effectiveness
in reducing AMAT and energy per access.
II. RELATED WORK
A number of studies have, in the past, used small memory
buffers to augment the capacity of the main L1D, and thus
improve performance and energy consumption. In this paper,
such memory structures are referred to as assist caches. The
Filter cache [3] for instance is an assist that reduces energy
consumption for cache memory accesses by using a very small
memory buffer in between the processor and L1D. However,
these energy savings are obtained at the cost of increased
access latencies, and thus higher average memory access time
(AMAT). The Victim Cache (VC) [4] on the other hand aims
to decrease AMAT by reducing the cost of conflict misses.
The VC stores victims of L1D evictions such that in the event
of future references to them, the lines can be returned to the
L1D in a single cycle rather than through a long latency,
energy consuming cache miss. On a VC hit, the requested
line is moved to the L1D, and the corresponding entry from
the L1D evicted to the VC. This swap operation constitutes
an energy overhead, and is a significant disadvantage of the
victim cache. Stiliadis et al. overcame this disadvantage with
their proposal, Selective Victim Caching (SVC) [1]. In SVC, the
swap operation is prevented from occurring if the incumbent
L1D cache line is found to be more reusable than the requested
VC line. SVC considerably reduces the number of swaps as
compared to a conventional victim cache with the same miss
rate and latency improvements.
However, these proposals consider the L1D as the primary
target for data references by the processor, and the assist as an
auxiliary cache. A majority of references are thus serviced
by the larger L1D cache, and consequently, the relatively
shorter latency and energy per access of the assist cache remain
978-1-4799-3432-4/14/$31.00 ©2014 IEEE 1945