Runtime Support for Multigrain and Multiparadigm Parallelism Panagiotis E. Hadjidoukas, Eleftherios D. Polychronopoulos, and Theodore S. Papatheodorou High Performance Computing Information Systems Laboratory, Department of Computer Engineering and Informatics University of Patras, Rio 26500, Patras, Greece {peh,edp,tsp}@hpclab.ceid.upatras.gr http://www.hpclab.ceid.upatras.gr Abstract. This paper presents a general methodology for implement- ing on clusters the runtime support for a two-level dependence-driven thread model, initially targeted to shared-memory multiprocessors. The general ideal is to exploit existing programming solutions for these archi- tectures, like Software DSM (SWDSM) and Message Passing Interface. The management of the internal runtime system structures and of the dependence-driven multilevel parallelism is performed with explicit mes- sages, exploiting however the shared-memory hardware of the available SMP nodes whenever this is possible. The underlying programming mod- els and hybrid programming solutions are not excluded, using threads for the intra-node parallelism. The utilization of shared virtual memory for thread stacks and a translator for allocating Fortran77 common blocks in shared memory enable the execution of unmodified OpenMP codes on clusters of SMPs. Initial performance results demonstrate the efficient support for fork-join and multilevel parallelism on top of SWDSM and MPI and confirm the benefits of explicit, though transparent, message passing. 1 Introduction As clusters of multiprocessor nodes are an attractive platform for high-end sci- entific computing, the need for two-level thread models was emerged. Currently, message passing, standardized with MPI [8], and Shared Address Space, imple- mented in software with page-based Shared Virtual Memory (SVM) protocols, are the two leading programming paradigms for these systems. An alternative programming solution is the hybrid-programming model [2], combining MPI for the outer level with multithreading inside each node. In this paper, we present a general approach for implementing the runtime support of the Nanothreads Programming Model (NPM) [11], a two-level thread model, on clusters of compute nodes. The idea is to exploit existing programming solutions for these architectures and extensively use messaging to minimize the dependence on SVM whenever hardware shared memory is not available. Ma- jor architectural features of the runtime system include the adoption of a lazy S. Sahni et al. (Eds.) HiPC 2002, LNCS 2552, pp. 184–194, 2002. c Springer-Verlag Berlin Heidelberg 2002