Tolerating Byzantine Faulty Clients in a Quorum System
Barbara Liskov
MIT CSAIL
Cambridge, MA, USA
Rodrigo Rodrigues
INESC-ID / Instituto Superior T´ ecnico
Lisbon, Portugal
Abstract
Byzantine quorum systems have been proposed that work
properly even when up to f replicas fail arbitrarily. How-
ever, these systems are not so successful when confronted
with Byzantine faulty clients. This paper presents novel
protocols that provide atomic semantics despite Byzantine
clients. Our protocols prevent Byzantine clients from inter-
fering with good clients: bad clients cannot prevent good
clients from completing reads and writes, and they cannot
cause good clients to see inconsistencies. In addition we
also prevent bad clients that have been removed from oper-
ation from leaving behind more than a bounded number of
writes that could be done on their behalf by a colluder.
Our protocols are designed to work in an asynchronous
system like the Internet and they are highly efficient. We
require 3f +1 replicas, and either two or three phases to do
writes; reads normally complete in one phase and require
no more than two phases, no matter what the bad clients are
doing.
We also present strong correctness conditions for sys-
tems with Byzantine clients that limit what can be done on
behalf of bad clients once they leave the system. Further-
more we prove that our protocols are both safe (they meet
those conditions) and live.
1 Introduction
Quorum systems [4, 13] are valuable tools for building
highly available replicated data services. A quorum system
can be defined as a set of sets (called quorums) with certain
intersection properties. These systems allow read and write
operations to be performed only at a quorum of the servers,
since the intersection properties ensure that any read op-
eration will have access to the most recent value that was
written.
The original work on quorum systems assumed that
servers fail benignly, i.e., by crashing or omitting some
steps. More recently, researchers have developed tech-
niques that enable quorum systems to provide data avail-
ability even in the presence of arbitrary (Byzantine)
faults [9]. Earlier work provides correct semantics despite
server (i.e., replica) failures and also handles some of the
problems of Byzantine clients [2, 3, 5, 9, 10, 11, 12].
This paper extends this earlier work in two important
ways. First, it defines new protocols that efficiently han-
dle more problems caused by Byzantine clients than pre-
vious approaches. Our protocols compare favorably to all
previous proposals: they either rely on weaker assumptions
(e.g., about the network), or they are more efficient in terms
of operation latency and number of replicas.
Second, the paper formally defines novel correctness
conditions for Byzantine quorum systems, and proves that
our protocols meet such conditions. The correctness condi-
tions are stronger than what has been stated previously [11]
and what has been guaranteed by previous approaches.
Since a dishonest client can write garbage into the shared
variable, it may seem there is little value in limiting what
bad clients can do. But this is not the case, for two reasons.
First, bad clients can cause a protocol to misbehave so that
good clients are unable to perform operations (i.e., the pro-
tocol is no longer live) or observe incorrect behavior. For
example, if the variable is write-once, a good client might
observe that its state changes multiple times.
Second, bad clients can continue to interfere with good
ones even after they have been removed from operation,
e.g., by a system administrator who learns of the misbe-
havior. We would like to limit such interference so that,
after only a limited number of writes by good clients, any
lurking writes left behind by a bad client will no longer be
visible to good clients. A lurking write is a modification
launched by the bad client before it was removed from op-
eration that will become visible (possibly with help from an
accomplice) after it has left the system. By limiting such
writes we can ensure that the object becomes useful again
after the departure of the bad client, e.g., some invariant that
good clients preserve will hold.
Of course, it is not possible to prevent actions by a bad
client, even if it has been shut down, if the private key it
uses to prove that it is authorized to modify an object can
be used by other nodes; thus, we consider a bad client to be
in the system as long as any node knows its private key. (In
practice this problem might be handled by an administrator
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06)
0-7695-2540-7/06 $20.00 © 2006 IEEE