The Journal of Supercomputing, 23, 105–128, 2002 © 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Design and Prototype of a Performance Tool Interface for OpenMP BERND MOHR b.mohr@fz-juelich.de Research Centre Jülich, ZAM, Jülich, Germany ALLEN D. MALONY AND SAMEER SHENDE malony, sameer@cs.uoregon.edu Department of Computer and Information Science, University of Oregon FELIX WOLF f.wolf@fz-juelich.de Research Centre Jülich, ZAM, Jülich, Germany Abstract. This paper proposes a performance tools interface for OpenMP, similar in spirit to the MPI profiling interface in its intent to define a clear and portable API that makes OpenMP execution events visible to runtime performance tools. We present our design using a source-level instrumentation approach based on OpenMP directive rewriting. Rules to instrument each directive and their combina- tion are applied to generate calls to the interface consistent with directive semantics and to pass context information (e.g., source code locations) in a portable and efficient way. Our proposed OpenMP per- formance API further allows user functions and arbitrary code regions to be marked and performance measurement to be controlled using new OpenMP directives. To prototype the proposed OpenMP per- formance interface, we have developed compatible performance libraries for the Expert automatic event trace analyzer [17, 18] and the Tau performance analysis framework [13]. The directive instrumen- tation transformations we define are implemented in a source-to-source translation tool called Opari. Application examples are presented for both Expert and Tau to show the OpenMP performance inter- face and Opari instrumentation tool in operation. When used together with the MPI profiling interface (as the examples also demonstrate), our proposed approach provides a portable and robust solution to performance analysis of OpenMP and mixed-mode (OpenMP + MPI) applications. Keywords: performance analysis, parallel programming, OpenMP 1. Introduction With the advent of any proposed language system for expressing parallel opera- tion (whether as a true parallel language (e.g., ZPL [6]), parallel extensions to sequential language (e.g., UPC [4]), or parallel compiler directives (e.g., HPF [9])) questions soon arise regarding how performance instrumentation and measurement will be conducted, and how performance data will be analyzed and mapped to the language-level (high-level) parallel abstractions. Several issues make this an interest- ing problem. First, the language system implements a model for parallelism whose explicit parallel operation is generally hidden from the programmer. As such, par- allel performance events may not be accessible directly, requiring instead support from underlying runtime software to observe them in full. When such support is unavailable, performance must be inferred from model properties. Second, the lan- guage system typically transforms the program into its parallel executable form, making it necessary to track code transformations closely so that performance data