0-7695-1524-X/02 $17.00 (c) 2002 IEEE SIGMA: A Simulator Infrastructure to Guide Memory Analysis Luiz DeRose, K. Ekanadham, Jeffrey K. Hollingsworth, and Simone Sbaraglia IBM T.J. Watson Research Center Yorktown Heights, NY 10598 USA Dept. of Computer Science University of Maryland College Park, MD 20740, USA University of Rome “La Sapienza” Rome, Italy {laderose,eknath}@us.ibm.com hollings@cs.umd.edu sbaragli@iac.rm.cnr.it Abstract In this paper we present SIGMA (Simulation Infrastructure to Guide Memory Analysis), a new data collection framework and family of cache analysis tools. The SIGMA environment provides detailed cache information by gathering memory reference data using software-based instrumen- tation. This infrastructure can facilitate quick probing into the factors that influence the perform- ance of an application by highlighting bottleneck scenarios including: excessive cache/TLB misses and inefficient data layouts. The tool can also assist in perturbation analysis to determine per- formance variations caused by changes to architecture or program. Our validation tests using the SPEC Swim benchmark show that most of the performance metrics obtained with SIGMA are within 1% of the metrics obtained with hardware performance counters, with the advantage that SIGMA provides performance data on a data structure level, as specified by the programmer. 1. Introduction Understanding and tuning memory system performance is a critical issue for most scientific programs. To help pro- grammers tune their programs, a variety of tools have been created ranging from source code and binary analysis tools [1, 2, 3, 4, 5] to libraries and utilities to access hardware performance counters built into microprocessors [6, 7, 8, 9]. Depending on the type of problem being studied and stage of the tuning process (initial tuning of a new algo- rithm vs. fine-tuning for a specific platform), different tools are useful. One area that has been lacking is a set of tools that allow programmers to understand the precise memory refer- ences in their program that are causing poor cache behavior. Fine-grained information such as this is useful for tun- ing loop kernels, understanding the cache behavior of new algorithms, and to investigate how different parts of a program compete for and interact within the memory subsystem. In this paper we present a new data collection framework and family of cache analysis tools called SIGMA (Simulation Infrastructure to Guide Memory Analysis). The goal of the SIGMA environment is to provide detailed cache information by gathering memory reference data using software-based instrumentation, in order to provide feedback to programmers to help them to apply program transformations to improve cache performance. Typical tuning operations that programmers perform include padding of data structures to improve cache alignment, block- ing (also known as tiling) of their code to provide cache re-use, and loop fusion, also to increase cache and register re-use. However, some of the challenges that users face when optimizing their applications are identifying which data structures are causing poor memory behavior and detecting which sections of the program would benefit from modifications such as blocking or fusion. The SIGMA framework provides an environment to help users to identify data structures and code segments that are causing poor program performance due to data layout without having to re-execute the program several times. The SIGMA environment consists of a pre-execution tool that locates and instruments all instructions that refer to memory locations, a runtime data collection tool that performs a highly efficient lossless compression of the stream of memory addresses generated by the instrumentation, and a number of simulation and analysis tools that process the compressed memory reference trace to provide programmers with tuning information. We chose to use a post-compile instrumentation approach so that we would be able to gather data about the actual memory references generated by optimizing compilers rather than using source instrumentation which would gather data about the user specified array references. The simulation and analysis tools include a TLB simulator, a data cache simulator, a data prefetcher simulator, and a query mechanism that allows users to obtain performance metrics and memory usage statistics.