DOSGi: An Architecture for Instant Replication org Domaschka, Holger Schmidt, Franz J. Hauck Institute of Distributed Systems Ulm University {joerg.domaschka,holger.schmidt,franz.hauck}@uni-ulm.de udiger Kapitza Informatik 4 University of Erlangen-N¨ urnberg rrkapitz@cs.fau.de Hans P. Reiser LaSIGE University of Lisboa hans@di.fc.ul.pt Abstract Replicating off-the-shelf Java applications is difficult due to the inherent non-determinism of standard Java libraries and multithreading. We propose an OSGi-based architecture that makes applications deterministic at deployment time. 1. Introduction State machine replication is a fundamental concept to provide fault tolerance. All replicas start in the same state, receive the same input in identical order, have a deterministic behaviour, thus produce identical output and maintain a consistent state. Replication infrastructures typically provide the replication logic in a generic way, separated from the application logic. In practice, however, Java code frequently is non-deterministic. Changing a replica implementation in order that it fulfils the determinism requirements is an intrusive operation, not orthogonal to the application logic. Examples of non-determinism are time access, random numbers, and access to external resources. It is cumbersome to manually find the respective code sections and replace them with deterministic operations. Previous work handles some cases by intercepting system library calls [1], but this approach is not able to handle non-deterministic behaviour, e.g., of Java class libraries. Even more problematic is that most Java services rely on multithreading. In this case, replica state may depend on the local scheduling of threads, which is not deterministic across machines. Executing a single request at a time is a simple solution that ensures determinism, but it implies reduced performance and may even enforce a complete service redesign if a service implementation based on threads is to be replicated. Alternatively, multithreaded execution can be made deterministic by adequate scheduling support, either by modifying the system scheduler, or by using application- level scheduling [2]. We propose the DOSGi approach, which provides instant determinism by automatically intercepting non-deterministic operations during deployment of Java services on the basis of an off-the-shelf OSGi infrastructure [3]. The key con- tribution of this paper is to show how non-deterministic This work was partially supported by FCT through the Multiannual Funding and the CMU-Portugal Programs. code can be dynamically replaced by a deterministic version. The approach is transparent to the application developer (e.g., the application code may use java.util.Random to create random numbers), does not require a modified JVM or operating system, and works automatically without user intervention at service deployment time. In a DOSGi-enabled distributed infrastructure, a service can be deployed dynam- ically and instantaneously be made deterministic, allowing the infrastructure provider to offer “replication as a service”. 2. Basic architecture of DOSGi The vision of DOSGi is to make services deterministic on- the-fly at deployment time. DOSGi is embedded in an OSGi system and makes use of a replication framework, which is able to replicate services without further precautions. Technically, DOSGi relies on the OSGi component system for service life-cycle management. OSGi uses components (called bundles) that export/import functionality (Java pack- ages) to/from other bundles. When a bundle is installed, the OSGi framework wires the bundle by resolving these package dependencies. DOSGi assumes that an application is composed of multiple bundles and uses OSGi, its code loading facilities, its service abstraction, as well as its sup- port for fine-grained interception, application replacement, and application modification to eliminate non-determinism. All parts of an OSGi implementation that the replication and determinism mechanisms depend on are deterministic. Non-deterministic functionality such as garbage collection and caching does not influence the mechanisms presented here. In particular, service wiring is deterministic under the condition that the same set of bundles were installed in identical order on all nodes [3]. Figure 1 sketches the architecture of DOSGi. It consists of the OSGi run-time system, the replication framework, and our fault-tolerance bundle. The latter has two func- tionalities. First, it enables the framework to host and instantiate replicas. This is done by the Factory bundle that allows services running on other DOSGi instances to start new replicas at the current node. It takes care of loading, installing, and starting the required bundles first, and of starting the replica afterwards. The second key entity is the Rewriter bundle that comes with the installation hook. The installation hook is invoked each time a new bundle is