1 Atomic Page Update Methods for OpenMP-Aware Software DSM Yang-Suk Kee Institute of Computer Technology Seoul National University yskee@iris.snu.ac.kr Jin-Soo Kim Division of Computer Science KAIST jinsoo@cs.kaist.ac.kr Woo-Chul Jeun, Soonhoi Ha School of Computer Science and Engineering Seoul National University {wcjeun,sha}@iris.snu.ac.kr Abstract When software distributed shared memory (SDSM) is extended to utilize threads in support of OpenMP, a challenge is how to preserve memory consistency in a thread-safe way, which is known as “atomic page update problem”. In this paper, we show that this problem can be solved by creating two independent access paths to a physical page and by assigning different access permissions to them. Especially, we discuss three new methods using System V shared memory IPC, a new mdup() system call, and a fork() system call as well as a known method using file mapping. The main contribution of this paper is to introduce various solutions to the atomic page update problem and to compare their characteristics extensively. Experiments carried out on a Linux-based cluster of SMPs and an IBM SP Nighthawk system show that the proposed methods achieve better performance than the file mapping method and the method using the process creation mechanism is the best candidate for the IBM SP system. 1. Introduction OpenMP [1] is becoming the de facto standard for shared- address-space programming model. In addition to programming easiness inherent in shared-address-space model, OpenMP anticipates high performance in scientific applications. Even though the general target architecture of OpenMP is a single multiprocessor node, this model can be applicable to a cluster of multiprocessors. An intuitive way to extend OpenMP to cluster of multiprocessors is to use software distributed shared memory (SDSM), which emulates a shared address space over distributed memories. Many SDSM systems are implemented at user-level by using the page fault handling mechanisms, assuming uniprocessor nodes. This kind of SDSM system detects an unprivileged access to a shared page by catching a SIGSEGV signal and a user-defined signal handler updates the invalid page with a valid one. From the application point of view, this page-update is atomic since program control is returned to the application only after the signal handler completes the service on the fault. However, these single-threaded systems are inadequate to the thread-based parallelism of OpenMP. The conventional fault-handling process will fail in multithreaded environments because other threads may try to access the same page during the update period. The SDSM system faces a dilemma when multiple threads compete to access an invalid page within a short interval. On the first access to an invalid page, the system should set the page writable to replace with a valid one. Unfortunately, this change also allows other application threads to access the same page freely. This phenomenon is known as atomic page update and change right problem [2] or mmap() race condition [3]. For short, we call this the atomic page update problem. A known solution to this problem adopted by major multithreaded SDSM systems like TreadMarks [4], Brazos [5], and Strings [6] is to map a file to two different virtual addresses. Even though the systems using file mapping achieve fair good performance on dedicated systems, file mapping is not always the best solution. Operating system and working environment severely affect the performance of these systems. Moreover, file mapping has high initialization cost, experiences buffer