2001 P. Rupa, D. Blough, and D. Bakken A Preliminary Investigation of Precision vs. Fault Tolerance Trade-offs in Voting Algorithms Rupa Parameswaran Douglas M. Blough School of Electrical & Computer Engr. Georgia Institute of Technology [rupa, dblough]@ece.gatech.edu David Bakken School of Electrical Engr. & Computer Science Washington State University [bakken@eecs.wsu.edu] Introduction Replication and voting have been used extensively for providing fault tolerance in many types of systems. In this abstract, we briefly investigate their capability to improve accuracy, or precision, of results. We also discuss trade-offs between precision and fault tolerance degree that can be achieved through adjustment of replication degree and/or voting algorithm. These capabilities are illustrated through a simple simulation model for the results of replicas in a system. This being only a preliminary study, we study only the fault-tolerant mid-point voting algorithm and its variants. We also consider only the Normal distribution for generation of the results from correct replicas. Simulation Model and Algorithms The basic components of the simulation model are: 1. n : the number of modules 2. non-faulty distribution : probability distribution of results produced by non–faulty replicas. Here, the normal distribution is considered with a random real value as mean and a standard deviation of 1.0. 3. faulty distribution : distribution of results produced by faulty replicas. Here, these results are distributed in a worst-case manner with respect to the particular voting algorithm being considered. 4. voting algorithm The voting algorithms considered in our preliminary study are the fault-tolerant midpoint algorithm and two of its variants. The fault- tolerant midpoint algorithm uses a parameter k, which is the assumed maximum number of faulty replicas. n results are input to the voting algorithm (default values can be used for any missing results). The algorithm discards the k smallest and k largest results and returns the midpoint of the interval spanned by the remaining values. The two variants of this algorithm that we also study simply take the mean or the median of all remaining results after the extreme values have been discarded. Data and Interpretation Figures 1 and 2 show results obtained for n=5 and n=7 and for all three of the simulated algorithms. Each data point in these figures represents the standard deviation of the final voted result over 1000 iterations. This standard deviation is a probabilistic measure of the precision of the result. Figure 1 : N = 5 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0 0.5 1 1.5 No. of Faults Standard Deviation Mid-point Mean Median Figure 2 : N = 7 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 2 3 No. of Faults Standard Deviation Median Mean Mid-point The dominant feature obvious from these figures is that replication and voting improve the precision relative to a single node. In the case of a single replica, the standard deviation (σ) of the result is 1. Hence, from a probabilistic standpoint, replication and voting improve the precision of the result by 2-3 times when no faults