Abstract Building dependable distributed systems using ad hoc methods is a challenging task. Without proper support, an application programmer must face the daunting require- ment of having to provide fault tolerance at the application level, in addition to dealing with the complexities of the distributed application itself. This approach requires a deep knowledge of fault tolerance on the part of the appli- cation designer, and has a high implementation cost. What is needed is a systematic approach to providing depend- ability to distributed applications. Proteus, part of the AQuA architecture, fills this need, and provides facilities to make a standard distributed CORBA application de- pendable, with minimal changes to an application. Fur- thermore, it permits applications to specify, either directly or via the Quality Objects (QuO) infrastructure, the level of dependability they expect of a remote object, and will attempt to configure the system to achieve the requested dependability level. Our previous papers have focused on the architecture and implementation of Proteus. This pa- per describes how to construct dependable applications using the AQuA architecture, by describing the interface that a programmer is presented with and the graphical monitoring facilities that it provides. 1. Introduction Middleware support for building dependable distributed systems has the potential to ease the burden on application programmers, and increase the dependability of standard applications, by providing an easy way to make an appli- cation more dependable. In order to be useful, the middle- ware must be easy to add to an existing distributed appli- cation, must run on standard commercial off-the-shelf 1 This research has been supported by DARPA Contracts F30602-96-C-0315 and F30602-97-C-0276. hardware, and must interfere as little as possible with ap- plications at runtime. In particular, it should 1) provide a simple interface in which application objects can specify desires about the dependability of remote objects they use, 2) provide automatic and transparent detection of and re- covery from failures, and 3) manage a pool of resources in a manner consistent with the desires of multiple objects that require dependable remote objects. While these goals are clearly desirable, building a software infrastructure that achieves them is not an easy task. The AQuA architecture [Cuk98] is one approach to build- ing dependable distributed objects that attempts to meet these goals. In particular, AQuA aims to allow distributed applications to request and obtain a desired level of de- pendability using Proteus [Sab99]. Proteus dynamically manages the replication of distributed objects in order to make them dependable. More specifically, Proteus takes requests regarding the dependability of remote objects used by an application object and decides how to provide fault tolerance. The choice of how to provide fault tolerance in- volves choosing the style of replication, the type of faults to tolerate, and the location of the replicas, among other things. Once a decision is made, the system is configured to try to achieve the dependability requested by one or more application objects. Reconfiguration of the system can occur if faults occur, or if the requested dependability of one or more application objects changes. Several projects focus on building dependable distributed objects. The Eternal system [Nar97] adds fault tolerance to applications by object replication. However, Eternal does not support dynamic system configuration changes in re- sponse to changing application requirements. Electra [Maf95] provides fault tolerance to CORBA by building a specialized ORB. However, since Electra uses a non- standard ORB to provide group communication services, it is incompatible with other ORBs if the fault-tolerant fea- tures are used. The OpenDREAMS research project [Fel96] focuses on the design and implementation of an Object Group Service (OGS), which provides facilities for Building Dependable Distributed Applications Using AQUA 1 Jennifer Ren, Michel Cukier, Paul Rubel, and William H. Sanders Center for Reliable and High-Performance Computing Coordinated Science Laboratory and Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, Urbana, Illinois 61801 {ren, cukier, rubel, whs}@crhc.uiuc.edu David E. Bakken and David A. Karr BBN Technologies Cambridge, Massachusetts 02138 {dbakken, dkarr}@bbn.com