Cluster Comput (2008) 11: 57–73
DOI 10.1007/s10586-007-0051-6
Integrated parallel performance views
Aroon Nataraj · Allen D. Malony · Sameer Shende ·
Alan Morris
Received: 16 March 2007 / Accepted: 29 October 2007 / Published online: 27 November 2007
© Springer Science+Business Media, LLC 2007
Abstract The influences of the operating system and sys-
tem-specific effects on application performance are increas-
ingly important considerations in high performance comput-
ing. OS kernel measurement is key to understanding the per-
formance influences and the interrelationship of system and
user-level performance factors. The KTAU (Kernel TAU)
methodology and Linux-based framework provides paral-
lel kernel performance measurement from both a kernel-
wide and process-centric perspective. The first character-
izes overall aggregate kernel performance for the entire sys-
tem. The second characterizes kernel performance when it
runs in the context of a particular process. KTAU extends
the TAU performance system with kernel-level monitoring,
while leveraging TAU’s measurement and analysis capabili-
ties. We explain the rational and motivations behind our ap-
proach, describe the KTAU design and implementation, and
show working examples on multiple platforms demonstrat-
ing the versatility of KTAU in integrated system/application
monitoring.
Keywords Parallel performance · Kernel · Linux ·
Instrumentation · Measurement
A. Nataraj ( ) · A.D. Malony · S. Shende · A. Morris
Department of Computer and Information Science, University of
Oregon, Eugene, OR, USA
e-mail: anataraj@cs.uoregon.edu
A.D. Malony
e-mail: malony@cs.uoregon.edu
S. Shende
e-mail: sameer@cs.uoregon.edu
A. Morris
e-mail: amorris@cs.uoregon.edu
1 Introduction
The performance of parallel applications on high-perfor-
mance computing (HPC) systems is a consequence of the
user-level execution of the application code and system-
level (kernel) operations that occur while the application is
running. As HPC systems evolve towards ever larger and
more integrated parallel environments, the ability to ob-
serve all performance factors, their relative contributions
and interrelationship, will become important to comprehen-
sive performance understanding. Unfortunately, most paral-
lel performance tools operate only at the user-level, leaving
the kernel-level performance artifacts obscure. OS factors
causing application performance bottlenecks, such as those
demonstrated in [1, 2], are difficult to assess by user-level
measurements alone. An integrated methodology and frame-
work to observe OS actions relative to application activities
and performance has yet to be fully developed. Minimally,
such an approach will require OS kernel performance mon-
itoring.
The OS influences application performance both directly
and indirectly. Actions the OS takes independent of the
application’s execution have indirect effects on its perfor-
mance. In contrast, we say that the OS directly influences
application performance when its actions are a result of or
in support of application execution. This motivates two dif-
ferent monitoring perspectives of OS performance in order
to understand the effects. One perspective is to view the en-
tire kernel operation as a whole, aggregating performance
data from all active processes in the system and including
the activities of the OS when servicing system-calls made
by applications as well as activities not directly related to ap-
plications (such as servicing hardware interrupts or keeping
time). We will refer to this as the kernel-wide perspective,