A DSM PROTOCOL AWARE OF BOTH THREAD MIGRATION AND MEMORY CONSTRAINTS Ronald Veldema 1 , Bradford Larsen 2 , Michael Philippsen 1 1 University of Erlangen-Nuremberg, Computer Science Department, Programming Systems Group, Martensstr. 3, 91058 Erlangen, Germany 2 University of New Hampshire, Computer Science Department email: {veldema, philippsen}@cs.fau.de, brad.larsen@gmail.com ABSTRACT A DSM protocol ensures that a thread can access data allo- cated on another machine using some consistency protocol. The consistency protocol can either replicate the data and unify replica changes periodically or the thread, upon re- mote access, can migrate to the machine that hosts the data and access the data there. There is a performance trade- off between these extremes. Data replication suffers from a high memory overhead as every replicated object or page consumes memory on each machine. On the other hand, it is as bad to migrate threads upon each remote access since repeated accesses to the same distributed data set will cause repeated network communication whereas replication will incur this only once (at the cost of increased administration overhead to manage the replicas). We propose a hybrid protocol that uses selective repli- cation with thread migration as its default. Even in the pres- ence of extreme memory pressure and thread-migrations, our protocol reaches or exceeds the performance that can be achieved by means of manual replication and explicit changes of the application’s code. KEY WORDS DSM, protocol, virtual machine. 1 Introduction There are many problems that require a large memory, larger than a single machine’s core memory or even a whole cluster’s combined core memories. To name some exam- ples: combinatorial search problems, problems that use large graphs, particle simulations with large numbers of particles, etc. The DSM protocol presented here solely addresses these classes of problem sizes where swapping is always needed. It is implemented as an extension of LVM [8], a virtual machine for Java that adds a distributed shared memory. LVM supports these large problem sizes effi- ciently by implementing its own swapping of objects to disk instead of relying on the operating system. In a cluster context, each machine adds its memory and disk space to the available global memory. Thread migration is used to access remote objects. However, to avoid excessive thread migration selective object replication is needed. This pa- per presents such a protocol that limits object replication to curb both memory usage and the amount of thread migra- tion required. We use thread migration by default for two rea- sons. First, all DSM protocols that fetch data for their operation—whether using lazy-, release-, entry-, or scope- consistency protocols—require copies of data for calculat- ing local changes. In effect this at least halves available ap- plication memory. Second, we cannot afford to fetch huge numbers of objects locally. Contributions in this paper are: how to allow both replication and thread migration in the same protocol effi- ciently, how to limit replication to a fixed pool of memory, and a simple heuristics to decide which objects to replicate. Note that no other DSM protocol that we know of fully ad- dresses the problem of maintaining memory consistency of shared data under thread migration. 2 Related Work Most DSMs (including the ones mentioned here) assume that applications completely fit into memory and that suf- ficient memory is available to keep two or more copies of all data. Our protocol only replicates a small part of the data. No other system performs our lazy diff-pulling on thread migration or limits the amount of memory available for replication. Some page-based DSM systems, e.g. the Coherent Virtual machine (CVM) [7] and Millipede [4], can improve performance by selectively applying thread migration. In general, if a given page is written often enough, thread mi- gration is applied (where we do thread migration by default and replicate objects with a lazy self-consistency protocol). Unlike with LVM, with Treadmarks/CVM/Millipede, the available memory does not grow when machines are added. CRL [5] is a DSM library (for C) that provides an API for wrapping regions of memory to shared- objects (start-read/write(X), end-read/write(X)). Upon start-read/write(X), the region X is mapped locally. MCRL [3] extends CRL with thread migration. A start- write(X) now causes migration of the current activation record to the machine that hosts X. Under some heuristics, some reads cause computation migration as well. Overall, we differ from MCRL’s protocols in various ways: we per-