Voting with Witnesses: A Consistency Scheme for Replicated Files Jehan-Franc ¸ois Pˆ aris † Computer Systems Research Group Department of Electrical Engineering and Computer Sciences University of California, San Diego La Jolla, California 92093 International Conference on Distributed Computing Systems, 1986, pages 606–612 Abstract Voting schemes ensure the consistency of replicated ﬁles by dis- allowing all read and write requests that cannot collect an ap- propriate quorum of copies. This procedure requires a minimum number of three copies to be of any practical use and tends to disallow a relatively high number of read and write requests. We propose to replace some of these copies by mere records of the current state of the ﬁle. These records, called witnesses, will be assigned weights and participate to the collection of quorums. We show, that under very general assumptions, the reliability of a replicated ﬁle consisting of n copies and m witnesses is the same as the reliability of a replicated ﬁle consisting of n + m copies. We also compare the availability of a replicated ﬁle con- sisting of two copies and one witness with that of a ﬁle having three copies and show that, under normal circumstances, the two ﬁles have similar availabilities. Keywords: ﬁle consistency, distributed ﬁle systems, replicated ﬁles, voting. 1 Introduction Various distributed ﬁle systems maintain replicated copies of the same ﬁle on different hosts [2, 4, 12, 13, 17]. File replication, as this technique is known, has indeed several major advantages: Since ﬁle contents are replicated, no single device failure can destroy any data. Storing copies of a ﬁle on two or more distinct hosts guarantees that no single host failure will make the ﬁle unaccessible. Moreover, access times for read operations are also improved since replication increases the probability that any given read operation can be performed on a local copy of the ﬁle. The existence of several copies of the same ﬁle residing on different hosts immediately raises the issue of ﬁle consistency [1, † “Voting with Witnesses: A Consistency Scheme for Replicated Files.” In Proceedings of the 6 th International Conference on Dis- tributed Computing Systems, Cambridge: IEEE, 1986, 606–612. Author’s address: Department of Computer Science, University of Hous- ton, 501 Philip G. Hoffman Hall, Houston, Texas 77204-3010. 3, 9–11, 14, 16, 18]. It would indeed be extremely burdensome for the users of a replicated ﬁle system to keep track of the sta- tus of every copy of every replicated ﬁle. We therefore need a scheme to determine which copies of a the replicated ﬁle are up to date. This scheme is to operate correctly in the presence of any combination of host and subnet failures. Voting is the best known example of such consistency schemes. In its simplest form, voting assumes that the current state of a replicated ﬁle is the state of the majority of its copies. Ascertaining the state of a replicated ﬁle thus require accessing a majority of its copies. Should this be prevented by one or more failures, the ﬁle is considered unavailable. We present here an extension of voting in which some copies of the ﬁle are replaced by much smaller records of the current state of the ﬁle. Although not containing any data themselves, these records called witnesses can testify about the current state of the replicated ﬁle and can vote like conventional copies. Section 2 of this paper surveys existing consistency schemes for replicated ﬁles. Section 3 introduces witnesses and discusses variants of our basic schemes. Section 4 contains a brief reliabil- ity and availability analysis of witness schemes under standard Markovian assumptions. Finally, section 5 has our conclusions. 2 Existing Consistency Algorithms As pointed out by Gifford [6], algorithms for maintaining repli- cated data objects fall into two categories. The ﬁrst ones select for each replicated data object a primary site, or synchroniza- tion site, that performs all update arbitrations. Distributed IN- GRES [17] and LOCUS [112, 19] follow this approach. The main advantage of the scheme is its simplicity. Its main draw- back is its vulnerability to failures of the synchronization site. LOCUS allows then the selection of a new synchronization site; even then all the information present at the former synchroniza- tion site is lost. Algorithms that do not depend on a unique synchronization site are more complex. They can rely either on queued update 1