CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2007; 19:2185–2205 Published online 31 May 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1166 APEX-Map: a parameterized scalable memory access probe for high-performance computing systems Erich Strohmaier ∗,† and Hongzhang Shan Future Technology Group, Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, U.S.A. SUMMARY The memory wall between the peak performance of microprocessors and their memory performance has become the prominent performance bottleneck for many scientific application codes. New benchmarks measuring data access speeds locally and globally in a variety of different ways are needed to explore the ever increasing diversity of architectures for high-performance computing. In this paper, we introduce a novel benchmark, APEX-Map, which focuses on global data movement and measures how fast global data can be fed into computational units. APEX-Map is a parameterized, synthetic performance probe and integrates concepts for temporal and spatial locality into its design. Our first parallel implementation in MPI and various results obtained with it are discussed in detail. By measuring the APEX-Map performance with parameter sweeps for a whole range of temporal and spatial localities performance surfaces can be generated. These surfaces are ideally suited to study the characteristics of the computational platforms and are useful for performance comparison. Results on a global-memory vector platform and distributed-memory superscalar platforms clearly reflect the design differences between these different architectures. Published in 2007 by John Wiley & Sons, Ltd. Received 7 April 2006; Revised 11 September 2006; Accepted 19 December 2006 KEY WORDS: performance evaluation; benchmarking; workload characterization; high-performance comput- ing; performance modeling; data locality 1. INTRODUCTION The memory wall has become the prominent performance bottleneck for many scientific application codes during the last few decades. However, many benchmarking efforts in scientific computing have ∗ Correspondence to: Erich Strohmaier, Future Technology Group, Computational Research Division, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, U.S.A. † E-mail: estrohmaier@lbl.gov This article is a U.S. Government work and is in the public domain in the U.S.A.