The Journal of Supercomputing, 23, 105–128, 2002
© 2002 Kluwer Academic Publishers. Manufactured in The Netherlands.
Design and Prototype of a Performance Tool
Interface for OpenMP
BERND MOHR b.mohr@fz-juelich.de
Research Centre Jülich, ZAM, Jülich, Germany
ALLEN D. MALONY AND SAMEER SHENDE malony, sameer@cs.uoregon.edu
Department of Computer and Information Science, University of Oregon
FELIX WOLF f.wolf@fz-juelich.de
Research Centre Jülich, ZAM, Jülich, Germany
Abstract. This paper proposes a performance tools interface for OpenMP, similar in spirit to the
MPI profiling interface in its intent to define a clear and portable API that makes OpenMP execution
events visible to runtime performance tools. We present our design using a source-level instrumentation
approach based on OpenMP directive rewriting. Rules to instrument each directive and their combina-
tion are applied to generate calls to the interface consistent with directive semantics and to pass context
information (e.g., source code locations) in a portable and efficient way. Our proposed OpenMP per-
formance API further allows user functions and arbitrary code regions to be marked and performance
measurement to be controlled using new OpenMP directives. To prototype the proposed OpenMP per-
formance interface, we have developed compatible performance libraries for the Expert automatic
event trace analyzer [17, 18] and the Tau performance analysis framework [13]. The directive instrumen-
tation transformations we define are implemented in a source-to-source translation tool called Opari.
Application examples are presented for both Expert and Tau to show the OpenMP performance inter-
face and Opari instrumentation tool in operation. When used together with the MPI profiling interface
(as the examples also demonstrate), our proposed approach provides a portable and robust solution to
performance analysis of OpenMP and mixed-mode (OpenMP + MPI) applications.
Keywords: performance analysis, parallel programming, OpenMP
1. Introduction
With the advent of any proposed language system for expressing parallel opera-
tion (whether as a true parallel language (e.g., ZPL [6]), parallel extensions to
sequential language (e.g., UPC [4]), or parallel compiler directives (e.g., HPF [9]))
questions soon arise regarding how performance instrumentation and measurement
will be conducted, and how performance data will be analyzed and mapped to the
language-level (high-level) parallel abstractions. Several issues make this an interest-
ing problem. First, the language system implements a model for parallelism whose
explicit parallel operation is generally hidden from the programmer. As such, par-
allel performance events may not be accessible directly, requiring instead support
from underlying runtime software to observe them in full. When such support is
unavailable, performance must be inferred from model properties. Second, the lan-
guage system typically transforms the program into its parallel executable form,
making it necessary to track code transformations closely so that performance data