AHYBRIDTOOLFORTHEPERFORMANCEEVALUATIONOFNUMA ARCHITECTURES James Westall Robert Geist Department of Computer Science Clemson University Clemson, SC 19634–1906, U.S.A. ABSTRACT We present a system for describing and solving closed queuing network models of the memory access perfor- mance of NUMA architectures. The system consists of a model description language, solver engines based upon both discrete event simulation and derivatives of the Mean Value Analysis (MVA) algorithm, and a model manager used to translate model descriptions to the forms required by the solvers. A single model description file is used to describe the essential elements that characterize the NUMA system and its workload. During a single simulation or MVA modeling run it is easy to dynamically vary elements of the system model, such as mean device service times, elements of the workload model such as cache miss rates, or both. Use of the extremely fast, but approximating, MVA solvers to interpolate between design points computed by the slower simula- tor allows the analyst to obtain detailed and accurate results in minimal time. Keywords: modeling methodology; discrete event simulation; mean value analysis; NUMA architec- tures. 1 INTRODUCTION While small-scale shared memory multiprocessors, consisting of tens of processors attached to a single shared bus, have been available for many years, large- scale systems with shared memory and scalable in- terconnection structures are relatively new (Lenowski and Weber, 1995). Currently available systems of this type include the Cray T3D, HP-Convex Exemplar SPP 1200, and Encore GigaMax. In these large-scale systems, memories are logically shared but physically distributed. Thus, the traditional assumption of con- stant access time to main memory is no longer valid, and so performance prediction tools must be adjusted to account for non-uniform memory access (NUMA) times. Because of hardware imposed requirements for si- multaneous possession of multiple resources, queue- ing network models have not been well suited for the performance evaluation of traditional shared mem- ory multiprocessors. The use of split transaction bus architecture with queueing in the interconnec- tion network makes NUMA systems somewhat more amenable to the use of queueing network based per- formance tools. However, some significant obstacles to their use must be overcome. While queuing network models (Lazowska, Zahor- jan, Graham, and Sevcik, 1984; Trivedi, 1983) can provide extremely fast performance evaluation, ana- lytic solutions of such models are provided under an assumption that network devices are stochastically equivalent to first- come, first-served (FCFS) servers with exponentially distributed service times. Mean- value analysis (MVA) (Lazowska, Zahorjan, Graham, and Sevcik, 1984) is an easy to implement and widely used technique for solving closed queuing networks that satisfy these assumptions. The MVA algorithm works as follows. For a closed network of M servers, let q i,j denote the probability that a request leaving server i next visits server j . If Q =(q i,j ) is the M × M matrix of such probabilities, then any vector solution, λ, to λ = λQ (1) contains the relative number of visits to each node in the steady-state. It is only relative because (1) con- tains 1 degree of freedom. The mean network per- formance measures are then completely determined by the vector λ and the mean service times at the servers. It turns out that expected response time at server i with n customers in the network, E[R i (n)], is re- lated to expected customer population at that node, E[N i (n)], and expected service time at the node, E[S i ], by a simple formula: E[R i (n)] = E[S i ](1 + E[N i (n - 1)]) (2) Proceedings of the 1997 Winter Simulation Conference ed. S. Andradóttir, K. J. Healy, D. H. Withers, and B. L. Nelson 1029