CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE
Concurrency Computat.: Pract. Exper. 2014; 26:1328–1341
Published online 14 August 2013 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3122
SPECIAL ISSUE PAPER
A hardware counter-based toolkit for the analysis of memory
accesses in SMPs
Oscar G. Lorenzo
*
,†
, Tomás F. Pena, José C. Cabaleiro, Juan C. Pichel,
Juan A. Lorenzo and Francisco F. Rivera
Centro de Investigación en Tecnoloxías da Información (CITIUS), University of Santiago de Compostela,
Santiago de Compostela, Spain
SUMMARY
In this paper, a set of three hardware counter (HC)-based tools to characterise memory access of parallel
codes in Symmetric Multiprocessors (SMPs) is presented. This toolkit simplifies accessing and program-
ming HCs, which are included in modern microprocessors. Hardware counters are used to obtain information
about memory accesses in a parallel code at very low cost. This information is presented to the user in a
friendly way. The first tool can be used to automatically monitor the memory accesses of a system and to
analyse a code even if the source is not available. The second tool allows the user to insert in a source code,
in a simple and transparent way, the instructions needed to monitor and manage HCs. This way, specific parts
of the code can be analysed. The user can either add appropriate directives to a C code or use a graphical
interface to select those parts of the code to be analysed. The tool takes this source file and automatically
adds the monitoring code. The third tool takes the information gathered by the aforementioned tools, pro-
cesses it and displays it graphically. This tool shows the information in a comprehensive and simple way,
allowing the user to adjust the level of detail. The aim of these tools was to characterise the memory accesses
of parallel codes in multicore systems, in which the cache hierarchy can greatly influence the performance.
For illustrative purposes, these tools were used to carry out two case studies, a sparse matrix vector product
and a dot product. These studies have been made in two different environments. Anyway, they can be used
in almost any system as long as the necessary HCs are available. Copyright © 2013 John Wiley & Sons, Ltd.
Received 2 November 2012; Revised 12 July 2013; Accepted 23 July 2013
KEY WORDS: hardware counters; parallel codes; monitoring; memory hierarchy; irregular codes
1. INTRODUCTION
The behaviour of memory accesses is one of the most significant aspects influencing the perfor-
mance of any code. This fact is more and more relevant as the memory wall increases [1]. One
area where the memory management and utilisation is specially important is that of parallel and
distributed systems, and, in particular, in current multicore architectures.
For a parallel code to be correctly and efficiently executed, its programming must be careful.
Taking into account architectural features, particularly the behaviour of memory accesses, is critical
to improve locality among accesses and affinity between data and processors. Understanding the
performance of a program requires considering several factors, such as the underlying system or
the type of workload, which can lead to bottlenecks, or parts of the code where most of the time
*Correspondence to: Oscar G. Lorenzo, Centro de Investigación en Tecnoloxías da Información (CITIUS), University of
Santiago de Compostela, Spain.
†
E-mail: oscar.garcia@usc.es
Copyright © 2013 John Wiley & Sons, Ltd.