M. Bubak et al. (Eds.): ICCS 2008, Part I, LNCS 5101, pp. 20–31, 2008. © Springer-Verlag Berlin Heidelberg 2008 Integrated Data and Task Management for Scientific Applications Jarek Nieplocha 1 , Sriram Krishamoorthy 1 , Marat Valiev 1 , Manoj Krishnan 1 , Bruce Palmer 1 , and P. Sadayappan 2 1 Pacific Northwest National Laboratory, Richland, WA 99352, USA {jarek.nieplocha,sriram,marat.valiev, manoj,bruce.palmer}@pnl.gov 2 The Ohio State University, Columbus, OH 43210, USA saday@cse.ohio-state.edu Abstract. Several emerging application areas require intelligent management of distributed data and tasks that encapsulate execution units for collection of processors or processor groups. This paper describes an integration of data and task parallelism to address the needs of such applications in context of the Global Array (GA) programming model. GA provides programming interfaces for managing shared arrays based on non-partitioned global address space programming model concepts. Compatibility with MPI enables the scientific programmer to benefit from performance and productivity advantages of these high level programming abstractions using standard programming languages and compilers. Keywords: Global Array programming, computational kernels, MPI, task management, data management. 1 Introduction Since the dawn of distributed memory parallel computers, the development of application codes for such architectures has been a challenging task. The parallelism in the underlying problem needs to be identified and exposed; the data and computation then must be partitioned and mapped onto processors to achieve load balancing, and finally the interprocessor communication required to exchange the data between individual processors has to be carefully orchestrated to avoid deadlocks and unnecessary delays. When communication costs cannot be completely eliminated, alternative approaches should be taken to minimize the adverse effects of communi- cation on scalability. To achieve good performance as well as scalability, knowledge of the network topology, memory hierarchy, and even some understanding of the underlying implementation of communication libraries has been required. Factors such as these have made parallel programming a difficult task, leaving it to a limited number of expert scientific programmers.