Reducing Cache Misses by Application-Specific
Re-Configurable Indexing
K. Patel
†
L. Benini
#
E. Macii
†
M. Poncino
⋄
#
Universit ` a di Bologna
Bologna, ITALY 40136
‡
Politecnico di Torino
Torino, ITALY 10129
⋄
Universit ` a di Verona
Verona, ITALY 37134
ABSTRACT
The predictability of memory access patterns in embed-
ded systems can be successfully exploited to devise effective
application-specific cache optimizations. In this work, we pro-
pose an improved indexing scheme for direct-mapped caches,
which drastically reduces the number of conflict misses by us-
ing application-specific information; the scheme is based on
the selection of a subset of the address bits.
With respect to similar approaches, our solution has two main
strengths. First, it models the misses analytically by building
a miss equation, and exploits a symbolic algorithm to compute
the exact optimum solution (i.e., the subset of address bits to
be used as cache index that minimizes conflict misses). Sec-
ond, we designed a re-configurable bit selector, which can be
programmed at run-time to fit the optimal cache indexing to
a given application.
Results show an average reduction of conflict misses of 24%,
measured over a set of standard benchmarks, and for different
cache configurations.
1. Introduction
Embedded systems offer additional opportunities for perfor-
mance or energy optimization with respect to their general-
purpose counterparts, since their execution context (a well-
defined application mix) results in a higher predictability of
execution and memory access patterns. Such predictability
enables powerful application-specific optimizations that have
been particularly successful in the design of the memory hi-
erarchy (see [1, 2] for a comprehensive survey).
Modern embedded processors have extensive support for hi-
erarchical memory organization; for this reason, recent works
have started to address the issue of application-specific cache
optimizations ([4]–[6]).
In this work, we propose an application-specific cache opti-
mization which targets the reduction of conflict misses. We
focus on direct-mapped caches, which are known to provide
smaller access times, have simple implementation, and are
conventionally used as L1 caches, especially in low-end proces-
sors. In direct mapped caches, cache lines are conventionally
addressed by using the least significant bits of the memory
address. For some access patterns, this indexing mechanism
may incur in a large number of conflict misses.
Our solution improves the basic indexing scheme by exploiting
the possibility of profile-driven optimization. The proposed
indexing is based on a selection of a subset of the bits of the
memory address which minimizes conflict misses on a given
address trace. The main contributions of our work are in
two areas. First, we model conflict misses with a Boolean
condition (which analytically expresses the total number of
conflict misses of a trace), and we propose a novel symbolic
algorithm (based on BDDs and ADDs) which yields an exact
solution to the optimal bit selection problem.
Second, we propose an architectural extension, namely a pro-
grammable index bit selector which enables run-time adap-
tation of the index bit selection to different applications or
application mixes. We designed and optimized the layout of
the selector, and we show that it imposes minimal overhead
on cache access time and power. Results on a set of embed-
ded applications have shown an average reduction of conflict
misses of 24% on average (100% maximum, i.e., a reduction
to zero misses) with respect to a conventional indexing that
uses the least significant bits.
The paper is organized as follows. Section 2 reviews previous
works on advanced cache indexing schemes. Section 3 de-
scribes the proposed indexing algorithm. Section 4 illustrates
the architecture of the programmable selector. Simulation re-
sults are provided in Section 5. Finally, Section 6 draws some
concluding remarks.
2. Background and Previous Work
The problem of optimal cache indexing has strong analogies
with optimal hashing. In fact, indexing can be viewed as
mapping a set of 2
n
items (the space of all possible n-bit
addresses – the key space, in the hashing terminology) into a
(much smaller) space of 2
m
,m<n items (the number of cache
entries – the address space), with the objective of minimizing
the number of conflicts (collisions), that is, distinct items
mapping to the same entry.
This task is accomplished by means of a hash function H,
which maps a generic address X in the key space to a value
Y = H(X) of the address space. There exists a vast theory
about hash functions, with strong optimality results for per-
fect hashing [3], but such advanced hashing techniques are not
suitable for hardware implementation.
Conventional cache indexing implements the traditional
“modulo”-based hashing, in which a key X is mapped to an
address Y = X mod 2
m
; this scheme guarantees a reasonable
distribution of the keys over the address space, and lends it-
self to a minimum-cost hardware implementation, since the
modulo operation amounts to selecting the least significant
bits of the address.
Several alternative hashing schemes have also been experi-
mented for cache indexing. Exclusive-OR (XOR) based hash-
ing functions are a quite popular solution [7, 8, 9]. Index bits
are obtained by XOR-ing a given number of address bits; the
general idea can be extended to Boolean operators other than
0-7803-8702-3/04/$20.00 ©2004 IEEE. 125