Byte code Transformations for Distributed Threads in Java Eddy Truyen Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327602 eddy@cs.kuleuven.ac.be Danny Weyns Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327602 weyns@cs.kuleuven.ac.be Pierre Verbaeten Distrinet, KULeuven Celestijnenlaan 200A 3001 Leuven ++32(0)16327566 pv@cs.kuleuven.ac.be ABSTRACT In this paper, we study the shift of thread semantics that arises when adapting a centralized Java program for execution in a distributed environment. More specifically we focus on distributed applications that are developed by means of a distributed control flow programming model like Java RMI or OMG CORBA. The shift in thread semantics causes unexpected execution results or run-time errors if these differences were not taken into account by the programmer. We overcome this semantical gap between local and distributed programming by extending Java programming with the notion of distributed thread identity. Propagation of a globally unique, distributed thread identity provides a uniform mechanism by which all the program’s constituent objects involved in a distributed control flow can uniquely refer to that distributed thread as one and the same computational entity. We have implemented distributed thread identity by means of byte code transformation of application programs. Keywords Threads, Distributed systems, Java 1. INTRODUCTION With the growing need for increased scalability and 7x24 hours on-line systems, the mass production of cheap and powerful computing devices, and the improved networking facilities, companies are evolving their intra-domain systems and applications (e.g. internal business workflow, departmental appointment systems, high performance processing systems) from monolithic centralized servers to modular, decentralized systems executing in a distributed environment. A distributed environment consists of multiple physical computing nodes that are interconnected by means of a network. Different distributed programming models exists such as asynchronous messaging, publish/subscribe systems, blackboard/tuple spaces (e.g Linda, JavaSpaces). However the mainstream of intra-domain distributed systems are developed using an object-based control flow programming model such as Java RMI or OMG CORBA (method invocation between objects). This model is popular because it inherits some of the benefits of object-oriented programming languages such as Java and C++ and is similar to the ‘good old’ RPC inter-process communication style. Another advantage is that control flow programming models provide a good level of location transparency. However, writing distributed applications or adapting a local program 1 for execution in a distributed environment remains difficult, because of the inherent shift of paradigms and semantics [4]. In Java RMI, well-known differences with ‘normal’ Java programming are the separation between class and interface of a remote object (shift of paradigms), the pass-by-copy semantics of non-remote arguments to a remote method invocation (shift of semantics) and the inherently more complicated failure modes of remote method invocation (shift of semantics). The paradigm shift makes it necessary to re-engineer large parts of existing centralized programs. Shift of semantics potentially leads to unexpected execution results or run-time errors, if these differences were not taken into account by the programmer. Some of these problems are well-studied and practical solutions have been worked out that make the implementation of distribution- related aspects more transparent to the programmer [4][1]. In this paper, we study a very particular shift of semantics, namely the shift of thread semantics that arises when adapting a local Java program for execution in a distributed environment. A thread is the unit of computation. It is a sequential flow of control within a single address space (i.e. JVM). More specifically we focus on distributed applications that are developed by means of an object-based control flow programming model like Java RMI or OMG CORBA. It is important to understand that the computational entities in this kind of applications execute as flows of control that may cross physical node boundaries, contrary to how conventional Java threads are confined to a single address space. In the remainder of this paper we refer to such a distributed computational entity as a distributed thread of control, in short distributed thread [2]. A distributed thread is a logical sequential flow of control that may span several address spaces (i.e. Java Virtual Machines (JVMs)). As shown in Figure 1, a distributed thread τ is physically implemented as a concatenation of local (per JVM) threads 1 A local program executes completely within the boundaries of one logical address space, e.g Java Virtual Machine (JVM) Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference ’00, Month 1-2, 2000, City, State. Copyright 2000 ACM 1-58113-000-0/00/0000…$5.00.