APC: A Novel Memory Metric and Measurement
Methodology for Modern Memory Systems
Dawei Wang, Member, IEEE and Xian-He Sun, Fellow, IEEE
Abstract—Due to the infamous “memory wall” problem and a drastic increase in the number of data intensive applications, memory rather
than processors has become the leading performance bottleneck in modern computing systems. Evaluating and understanding memory
system performance is increasingly becoming the core of high-end computing. Conventional memory metrics, such as miss ratio, AMAT,
etc., are designed to measure a given memory performance parameter, and do not reflect the overall performance or complexity of a
modern memory system. On the other hand, widely used system-performance metrics, such as IPC, are designed to measure CPU
performance, and do not directly reflect memory performance. In this paper, we propose a novel memory metric called Access Per Cycle
(APC), which is the number of data accesses per cycle, to measure the overall memory performance with respect to the complexity of
modern memory systems. A unique contribution of APC is its separation of memory evaluation from CPU evaluation; therefore, it provides
a quantitative measurement of the “data-intensiveness” of an application. Simulation results show that the memory performance measured
by APC captures the concurrency complexity of modern memory systems, while other metrics cannot. APC is simple, effective, and is
significantly more appropriate than existing memory metrics in evaluating modern memory systems.
Index Terms—Memory performance measurement, memory metric, measurement methodology
1 INTRODUCTION
T
HE rapid advances of semiconductor technology have
driven large increases in processor performance over the
past thirty years. However, memory performance has not
experienced such dramatic of gains as processors; this leaves
memory performance lagging far behind CPU performance.
This growing performance gap between processor and mem-
ory is referred to as the “memory wall” [1], [2]. The “memory
wall” problem is experienced not only in main memory but
also in on-die caches. For example, in the Intel Nehalem
architecture CPU, each L1 data cache has a four-cycle hit
latency, and each L2 cache has a 10-cycle hit latency [3].
Additionally, the IBM Power6 has a four-cycle L1 cache hit
latency and an L2 cache hit latency of 24 cycles [4]. The large
performance gap between processor and memory hierarchy
makes memory-access the dominant performance factor in
high-end computing. Recent research tries to improve the
performance of memory systems. However, understanding
the performance of modern hierarchical memory systems
remains elusive for many researchers and practitioners.
While memory (“memory” is referred to as synonym for
the entire memory hierarchy for the remainder of this paper)
is the bottleneck for performance, how to measure and
evaluate memory systems has become an important issue
facing the high performance computing community. The
conventionally used performance metrics, such as IPC
(Instruction Per Cycle) and Flops (Floating point operations
per second), are designed from a computing-centric point-of-
view. As such, they are comprehensive but affected by
instruction sets, CPU micro-architecture, memory hierarchy,
and compiler technologies, and cannot be applied directly to
measure the performance of a memory system. On the other
hand, existing memory performance metrics, such as miss
rate, bandwidth, and average memory access time (AMAT),
are designed to measure a particular component of a memory
system or the performance of a single access of the memory
system. They are useful in optimization and evaluation of a
given component, but cannot accurately characterize the
performance of the memory system as whole. In general,
component improvement does not necessarily lead to an
improvement in overall performance. For instance, when miss
rate decreases, IPC may not increase, and sometimes IPC will
decrease. (See Section 4.2 for details.) When non-blocking
caches are used, the AMAT metric shows a negative effect on
IPC. (See Section 4.2.3 for details.) Since there is no known
correlation study between existing memory metrics and the
final system performance, a frequent and common question of
practitioners is whether a component improvement actually
leads to a system improvement. Therefore, an appropriate
metric to measure memory systems is critically needed to
analyze system design and performance enhancements.
There are several reasons that traditional memory perfor-
mance metrics cannot characterize the overall performance of
a memory system. First, modern CPUs exploit several ILP
(Instruction Level Parallelism) technologies to overlap ALU
instruction executions and memory accesses. Out-of-order
execution overlaps CPU execution time and memory access
delay, allowing an application to hide the miss penalty of an
L1 data cache miss that hits the L2 cache. Multithreading
technology, such as SMT [5] or fine-grained multithreading
[6], can tolerant even longer misses through main memory by
•
D. Wang is with the Department of Computer Sciences, Illinois Institute
Technology, Chicago, IL 60616. E-mail: david.albert.wang@gmail.com.
•
X. Sun is with the Department of Computer Sciences, Illinois Institute
Technology, Chicago, IL 60616. E-mail: sun@iit.edu.
Manuscript received 07 Dec. 2011; revised 21 Dec. 2012; accepted 04 Feb. 2013.
Date of publication 24 Feb. 2013; date of current version 27 June 2014.
Recommended for acceptance by E. Miller.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TC.2013.38
1626 IEEE TRANSACTIONS ON COMPUTERS, VOL. 63, NO. 7, JULY 2014
0018-9340 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.