1 Modeling, Evaluation, and Testing of Paradyn Instrumentation System * Abstract This paper presents a case study of modeling, evaluating, and testing the data collection services (called an instrumentation system) of the Paradyn parallel performance measurement tool. The overall objective of the study is to use modeling- and simulation-based evaluation and provide feedback to the tool developers. An early feedback regarding instrumentation system overhead and performance helps the developers choose appropriate system configurations and task scheduling policies. We develop a resource occupancy model for the Paradyn instrumentation system (IS) and parameterize it for an IBM SP-2 platform. This model is parameterized with a measurement-based workload characterization and subsequently used to answer several “what if” questions regarding two policies to schedule instrumentation system tasks: collect-and-forward (CF) and batch-and- forward (BF) policies for three types of architectures: network of workstations (NOW), shared memory multiprocessors (SMP), and massively parallel processing (MPP) systems. In addition to comparing the two scheduling policies, the study also investigates two options for forwarding the instrumentation data: direct and binary tree forwarding for the MPP system. Simulation results indicate that the BF policy can significantly reduce the overheads. Based on this feedback, the BF policy was implemented in the Paradyn IS as an option to manage the data collection. Measurement-based testing results obtained from this enhanced version of the Paradyn IS are reported in this paper and indicate more than 60% reduction in the direct IS overheads when the BF policy is used. 1 Introduction Application-level software instrumentation systems (ISs) collect runtime information from parallel and distributed systems. This information is collected to serve various purposes, for example, evaluation of program execution on high performance computing and communication (HPCC) systems [23], monitoring of distributed real-time control systems [3,10], resource * This work was supported in part by DARPA contract No. DABT 63-95-C-0072 and National Science Foundation grant ASC-9624149. Abdul Waheed and Diane T. Rover * Department of Electrical Engineering Michigan State University E-mail: {waheed,rover}@egr.msu.edu Jeffrey K. Hollingsworth Department of Computer Science University of Maryland E-mail: hollings@cs.umd.edu