Strategies and Implementation for Translating OpenMP Code for Clusters Deepak Eachempati, Lei Huang, Barbara Chapman Department of Computer Science University of Houston Houston, TX 77204-3475 {dreachem, chapman}.cs.uh.edu, lhuang5@mail.uh.edu Abstract. OpenMP is a portable shared memory programming interface that promises high programmer productivity for multithreaded applications. It is de- signed for small and middle sized shared memory systems. We have developed strategies to extend OpenMP to clusters via compiler translation to a Global Ar- rays program. In this paper, we describe our implementation of the translation in the Open64 compiler, and we focus on the strategies to improve sequential region translations. Our work is based upon the open source Open64 compiler suite for C, C++, and Fortran90/95. 1 Introduction MPI is still the most popular and successful programming model for clusters. It, however, is error-prone and too complex for most non-experts. OpenMP is a pro- gramming model designed for shared memory systems that provides simple syntax to achieve easy-to-use, incremental parallelism, and portability. However, it is not avail- able for distributed memory systems including widely deployed clusters. We believe that it is feasible to use compiler technologies to extend OpenMP to Clusters to alle- viate the programming efforts on cluster. We have developed strategies[7][8][12] to implement it via Global Arrays (GA)[15]. GA is a library that provides an asynchronous one-sided, virtual shared memory programming environment for clusters. A GA program consists of a collection of independently executing processes, each of which is able to access data declared to be shared without interfering with other processes. GA enables us to retain the shared memory abstraction, but at the same time makes all communications explicit, thus enabling the compiler to control their location and content. Considerable effort has been put into the efficient implementation of GA's one-sided contiguous and strided communications. Therefore, we can potentially generate GA codes that will execute with high efficiency using our translation strategy. On the other hand, our strategy shares some of the problems associated with the traditional SDSM approach to trans- lating OpenMP for clusters: in particular, the high cost of global synchronization, and the difficulty of translating sequential parts of a program.