Cluster Comput (2008) 11: 57–73 DOI 10.1007/s10586-007-0051-6 Integrated parallel performance views Aroon Nataraj · Allen D. Malony · Sameer Shende · Alan Morris Received: 16 March 2007 / Accepted: 29 October 2007 / Published online: 27 November 2007 © Springer Science+Business Media, LLC 2007 Abstract The influences of the operating system and sys- tem-specific effects on application performance are increas- ingly important considerations in high performance comput- ing. OS kernel measurement is key to understanding the per- formance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides paral- lel kernel performance measurement from both a kernel- wide and process-centric perspective. The first character- izes overall aggregate kernel performance for the entire sys- tem. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabili- ties. We explain the rational and motivations behind our ap- proach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrat- ing the versatility of KTAU in integrated system/application monitoring. Keywords Parallel performance · Kernel · Linux · Instrumentation · Measurement A. Nataraj () · A.D. Malony · S. Shende · A. Morris Department of Computer and Information Science, University of Oregon, Eugene, OR, USA e-mail: anataraj@cs.uoregon.edu A.D. Malony e-mail: malony@cs.uoregon.edu S. Shende e-mail: sameer@cs.uoregon.edu A. Morris e-mail: amorris@cs.uoregon.edu 1 Introduction The performance of parallel applications on high-perfor- mance computing (HPC) systems is a consequence of the user-level execution of the application code and system- level (kernel) operations that occur while the application is running. As HPC systems evolve towards ever larger and more integrated parallel environments, the ability to ob- serve all performance factors, their relative contributions and interrelationship, will become important to comprehen- sive performance understanding. Unfortunately, most paral- lel performance tools operate only at the user-level, leaving the kernel-level performance artifacts obscure. OS factors causing application performance bottlenecks, such as those demonstrated in [1, 2], are difficult to assess by user-level measurements alone. An integrated methodology and frame- work to observe OS actions relative to application activities and performance has yet to be fully developed. Minimally, such an approach will require OS kernel performance mon- itoring. The OS influences application performance both directly and indirectly. Actions the OS takes independent of the application’s execution have indirect effects on its perfor- mance. In contrast, we say that the OS directly influences application performance when its actions are a result of or in support of application execution. This motivates two dif- ferent monitoring perspectives of OS performance in order to understand the effects. One perspective is to view the en- tire kernel operation as a whole, aggregating performance data from all active processes in the system and including the activities of the OS when servicing system-calls made by applications as well as activities not directly related to ap- plications (such as servicing hardware interrupts or keeping time). We will refer to this as the kernel-wide perspective,