Paradigms for Parallel Computation Gil Speyer, Natalie Freed, Richard Akis, and Dan Stanzione Arizona State University, Tempe, AZ {speyer, nafreed, rakis, dstanzi}@asu.edu Eric Mack High Performance Technologies Inc., Aberdeen Proving Ground, MD emack@hpti.com Abstract Message passing, as implemented in message passing interface (MPI), has become the industry standard for programming on distributed memory parallel architectures, while the threading on shared memory machines is typically implemented in OpenMP. Outstanding performance has been achieved with these methods, but only on a relatively small number of codes, requiring expert tuning, particularly as system size increases. With the advent of multicore/manycore microprocessors, and continuing scaling in the size of systems, an inflection point may be nearing that will require a substantial shift in the way large scale systems are programmed to maintain productivity. The parallel paradigms project was undertaken to evaluate the near- term readiness of a number of emerging ideas in parallel programming, with specific emphasis on their applicability to applications in the User Productivity Enhancement and Technology Transfer (PET) Electronics, Networking, and Systems/C4I (ENS) focus area. The evaluation included examinations of usability, performance, scalability, and support for fault tolerance. Applications representative of ENS problems were ported to each of the evaluated languages, along with a set of “dwarf” codes representing a broader workload. In addition, a user study was undertaken where teaching modules were developed for each paradigm, and delivered to groups of both novice and expert programmers to measure productivity. Results will be presented from six paradigms currently undergoing ongoing evaluation. Experiences with each of these models will be presented, including performance of applications re-coded across these models and feedback from users. Index Terms: Parallel processing, languages 1. Introduction For the last decade, high performance computing has been dominated by distributed memory parallel architectures. The most common model for programming these systems has been message passing. In the early 1990’s, the MPI (Message Passing Interface) standard was created by a consortium of academic, government, and industry participants (including the Department of Defense [DoD]) to unify the myriad of vendor specific message passing models that existed. At about the same time, the OpenMP standard was defined for threading in shared memory architectures. Since its definition, MPI has been the de facto standard for very large scale applications, used by virtually every large code. MPI codes have been used to set each new record in performance, including breaking the teraflop and hundred-teraflop barriers. The state of affairs at this time is that codes written in MPI will continue to scale, but the range of applications capable of achieving this scale will remain small, and the effort required to maintain scaling, will continue to increase. The real barrier to scalability seems to be the productivity of parallel programmers as the system complexity they are faced with increases. While MPI can exceed the performance of all other techniques, a heroic programming effort is required to achieve this performance. An effort of many years is typical for truly scalable complex scientific applications. Two technology trends are currently exacerbating this problem. First, the emergence of multi-core microprocessors has lead to the development of MPI- OpenMP hybrid codes, using OpenMP for the shared memory within the 8–16 (soon to be more) processors of a node, and using MPI for the distributed memory between nodes. This hybrid model presents greater challenges for programmers than either model did alone. Second, the sheer scale of modern high performance computing (HPC) systems, with thousands of nodes and tens of thousands of processor cores becoming more common (and hundreds of thousands of cores just a year DoD HPCMP Users Group Conference 2008 978-0-7695-3515-9/08 $25.00 © 2008 IEEE DOI 10.1109/DoD.HPCMP.UGC.2008.18 486