Atomic Transducers and Their Scalable Implementation David Ratajczak ∗ , Bor-Yuh Evan Chang, Manu Sridharan Computer Science Division, University of California, Berkeley {dratajcz,bec,manu s}@eecs.berkeley.edu Abstract We present a model of stateful computation for systems comprised of a large, dynamic set of processors. Our im- plementation exhibits the reconfigurability and scalability of recent distributed hash tables [23, 26, 27, 29] while guar- anteeing atomic [11, 19] semantics for objects of arbitrary type stored in the system. First, we present an algorithm whose liveness is conditional on nodes (processors) not fail- ing and all messages being delivered reliably. We then ap- ply the state machine approach [15, 28] to actively repli- cate the nodes of the high-level algorithm and to ensure reli- able message delivery under weaker environmental assump- tions. This approach allows us to leverage the established correctness of an existing consensus protocol to simplify the proof of the high-level algorithm’s atomicity and conditional liveness guarantees. Our implementation forms the basis of Concord, a novel data middleware for clusters of worksta- tions. By providing implicit load-balancing and reconfigu- ration facilities, we aim to increase the reliability and reduce the operational cost of such systems. Each author is a full- time student at UC Berkeley. This is a regular presenta- tion submission. 1 Introduction There has been a flourish of recent research on dis- tributed hash tables (DHTs) as a building block for large-scale distributed systems. DHTs are comprised of nodes (processors) that join and leave the system as active participants that share the burden of implement- ing a hash table of data objects. For large networks, any particular node may only know limited portions of the data set or active node set; thus it is possible that mes- sages sent to data objects are “routed” between nodes until one is found that has a copy of the desired object. DHT proposals are generally distinguished by the way ∗ 592 Soda Hall, Computer Science Division, UC Berkeley, Berkeley CA 94720-1776. in which the data set is partitioned and sparse routing information is maintained [23, 26, 27, 29, 30], though they commonly yield O(log N ) path lengths to data when N nodes are present. Recent work has focused on improving their resiliency under catastrophic or Byzan- tine failure conditions [2, 8], reducing path latency in heterogeneous networks [30], and reducing the number of links necessary to achieve small path lengths [23]. While current schemes provide the promise of ex- treme scalability and dynamism, they do so at the ex- pense of strong or even well-defined data semantics; they may fail to coordinate modifications to all replicas of an object, unknowingly return stale versions of data, or incorrectly inform the client that an object does not exist. Applications built atop such middleware services must either be restricted to write-once semantics [6] or exhibit unspecified behavior in “rare-case” scenar- ios [24]. In these schemes, the accessibility of the en- tire data set requires that network timing assumptions hold throughout the execution, complicating the analy- sis of such systems [17] and limiting their resilience to changing network conditions. Absolutely guaranteeing both the availability and atomicity of data in an asynchronous, failure-prone system is impossible [9]. As a result, any atomic sys- tem will be subject to periods of unavailability when certain liveness conditions are not met. However, we feel that atomicity is a crucial prerequisite for a data middleware service, both because of its intuitive sim- plicity and its composability [11, 19]. Moreover, an atomic system prone to periods of unavailability may be used as a backend for caching schemes providing increased availability with weaker semantics [1, 12]; it may not be possible to strengthen the semantics of a highly-available but non-atomic system. Hence we desire a system that absolutely guarantees atomicity while exhibiting the scalability of DHTs and “best ef- fort” availability. To this end, we first propose a model of state- ful computation—naturally extending the traditional 1