From Total Order to Database Replication Yair Amir and Ciprian Tutu Department of Computer Science Johns Hopkins University Baltimore, MD 21218, USA {yairamir, ciprian}@cnds.jhu.edu Technical Report CNDS-2001-6 http://www.cnds.jhu.edu November 5, 2001 Abstract This paper presents in detail an efficient and provably correct algorithm for database replication over partitionable networks. Our algorithm avoids the need for end-to-end acknowledgments for each action while supporting network partitions and merges and allowing dynamic instantiation of new replicas. One round of end-to-end acknowledgments is required only upon a membership change event such as a network partition. New actions may be introduced to the system at any point, not only while in a primary component. We show how performance can be further improved for applications that allow relaxation of consistency requirements. We provide experimental results that demonstrate the superiority of this approach. 1 Introduction Database replication is quickly becoming a critical tool for providing high availability, survivability and high performance for database applications. However, to provide useful replication one has to solve the non-trivial problem of maintaining data consistency between all the replicas. The state machine approach [27] to database replication ensures that replicated databases that start consistent will remain consistent as long as they apply the same deterministic actions (trans- actions) in the same order. Thus, the database replication problem is reduced to the problem of constructing a global persistent consistent order of actions. This is often mistakenly considered easy to achieve using the Total Order service (e.g. ABCAST, Agreed order, etc) provided by group com- munication systems. Early models of group communication, such as Virtual Synchrony, did not support network parti- tions and merges. The only failures tolerated by these models were process crashes, without recovery. Under these circumstances, total order is sufficient to create global persistent consistent order. Unfortunately, almost no real-world system today adheres to the requirement of never having network partitions. Even in local area networks, network partitions occur regularly due to either hardware (e.g. temporarily disconnected switches) or software (heavily loaded servers). Of course, in wide area networks, partitions can be common [5]. 1