Fault Tolerance in Finite State Machines using Fusion Bharath Balasubramanian 1 , Vinit Ogale 1 and Vijay K. Garg 2 1 Parallel and Distributed Systems Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin 2 IBM India Research Lab (IRL) Delhi {balasubr, ogale, garg}@ece.utexas.edu Abstract. Given a set of n dierent deterministic finite state machines (DFSMs), we examine the problem of tolerating k faults among them. The traditional ap- proach to this problem involves replication, requiring n.k backup DFSMs. For example, given two state machines, say A and B, to tolerate two faults, this ap- proach maintains two copies each of A and B, thus resulting in a total of six DF- SMs in the system. In this paper, we question the optimality of such an approach and present another approach based on the ‘fusion’ of state machines allowing for more ecient backups. We introduce the theory of fusion machines and provide an algorithm which can generate fusion machines corresponding to a given set of machines. Further, we have implemented this algorithm and tested it for various examples. It is important to note that our approach requires only k backup DF- SMs, as opposed to the n.k backup DFSMs required by the replication approach. 1 Introduction In distributed systems, it is often necessary to maintain the execution state of a server in the event of faults. Hence, designing fault tolerant systems remains an interesting avenue for research in this field. Traditional approaches to this prob- lem require some form of replication. One commonly used technique, which forms the basis of the work done in [1–6], involves replicating the server DF- SMs and sending client requests in the same order to all the servers. Another approach, seen in [7, 8], involves designating one of the servers as the primary and all the others as backups. Client requests are handled by the primary server, until it fails, and then one of the backups take over. In both these approaches, given n dierent DFSMs, in order to tolerate k faults, we need to maintain k extra copies of each DFSM, resulting in a total of n.k backup DFSMs. We propose an approach called fusion, that allows for more ecient back- ups. Given n dierent DFSMs, we tolerate k faults by having k backup DFSMs supported in part by the NSF Grants CNS-0509024, Texas Education Board Grant 781, and Cullen Trust for Higher Education Endowed Professorship. A significant portion of the work was performed when the author was at the University of Texas at Austin.