Voting with Witnesses: A Consistency Scheme for Replicated Files Jehan-Franc ¸ois Pˆ aris Computer Systems Research Group Department of Electrical Engineering and Computer Sciences University of California, San Diego La Jolla, California 92093 International Conference on Distributed Computing Systems, 1986, pages 606–612 Abstract Voting schemes ensure the consistency of replicated files by dis- allowing all read and write requests that cannot collect an ap- propriate quorum of copies. This procedure requires a minimum number of three copies to be of any practical use and tends to disallow a relatively high number of read and write requests. We propose to replace some of these copies by mere records of the current state of the file. These records, called witnesses, will be assigned weights and participate to the collection of quorums. We show, that under very general assumptions, the reliability of a replicated file consisting of n copies and m witnesses is the same as the reliability of a replicated file consisting of n + m copies. We also compare the availability of a replicated file con- sisting of two copies and one witness with that of a file having three copies and show that, under normal circumstances, the two files have similar availabilities. Keywords: file consistency, distributed file systems, replicated files, voting. 1 Introduction Various distributed file systems maintain replicated copies of the same file on different hosts [2, 4, 12, 13, 17]. File replication, as this technique is known, has indeed several major advantages: Since file contents are replicated, no single device failure can destroy any data. Storing copies of a file on two or more distinct hosts guarantees that no single host failure will make the file unaccessible. Moreover, access times for read operations are also improved since replication increases the probability that any given read operation can be performed on a local copy of the file. The existence of several copies of the same file residing on different hosts immediately raises the issue of file consistency [1, “Voting with Witnesses: A Consistency Scheme for Replicated Files.” In Proceedings of the 6 th International Conference on Dis- tributed Computing Systems, Cambridge: IEEE, 1986, 606–612. Author’s address: Department of Computer Science, University of Hous- ton, 501 Philip G. Hoffman Hall, Houston, Texas 77204-3010. 3, 9–11, 14, 16, 18]. It would indeed be extremely burdensome for the users of a replicated file system to keep track of the sta- tus of every copy of every replicated file. We therefore need a scheme to determine which copies of a the replicated file are up to date. This scheme is to operate correctly in the presence of any combination of host and subnet failures. Voting is the best known example of such consistency schemes. In its simplest form, voting assumes that the current state of a replicated file is the state of the majority of its copies. Ascertaining the state of a replicated file thus require accessing a majority of its copies. Should this be prevented by one or more failures, the file is considered unavailable. We present here an extension of voting in which some copies of the file are replaced by much smaller records of the current state of the file. Although not containing any data themselves, these records called witnesses can testify about the current state of the replicated file and can vote like conventional copies. Section 2 of this paper surveys existing consistency schemes for replicated files. Section 3 introduces witnesses and discusses variants of our basic schemes. Section 4 contains a brief reliabil- ity and availability analysis of witness schemes under standard Markovian assumptions. Finally, section 5 has our conclusions. 2 Existing Consistency Algorithms As pointed out by Gifford [6], algorithms for maintaining repli- cated data objects fall into two categories. The first ones select for each replicated data object a primary site, or synchroniza- tion site, that performs all update arbitrations. Distributed IN- GRES [17] and LOCUS [112, 19] follow this approach. The main advantage of the scheme is its simplicity. Its main draw- back is its vulnerability to failures of the synchronization site. LOCUS allows then the selection of a new synchronization site; even then all the information present at the former synchroniza- tion site is lost. Algorithms that do not depend on a unique synchronization site are more complex. They can rely either on queued update 1