1 Convergence Rates of Distributed Average Consensus with Stochastic Link Failures Stacy Patterson 1 , Bassam Bamieh, Fellow, IEEE, and Amr El Abbadi, Senior Member, IEEE Department of Computer Science, Department of Mechanical Engineering University of California, Santa Barbara Santa Barbara, CA 93106 USA Phone: (805) 893-4187 Fax: (805) 893-8533 Email: {sep,amr}@cs.ucsb.edu, bamieh@engineering.ucsb.edu Abstract—We consider a distributed average consensus algo- rithm over a network in which communication links fail with in- dependent probability. In such stochastic networks, convergence is defined in terms of the variance of deviation from average. We first show how the problem can be recast as a linear system with multiplicative random inputs which model link failures. We then use our formulation to derive recursion equations for the second order statistics of the deviation from average in networks with and without additive noise. We give expressions for the convergence behavior in the asymptotic limits of small failure probability and large networks. We also present simulation- free methods for computing the second order statistics in each network model and use these methods to study the behavior of various network examples as a function of link failure probability. Index Terms—Randomized consensus, distributed systems, multiplicative noise, gossip protocols. I. I NTRODUCTION W E consider the distributed average consensus problem over a network with stochastic link failures. Each node has some initial value and the goal is for all nodes to reach con- sensus at the average of all values using only communication between neighbors in the network graph. Distributed average consensus is an important problem that has been studied in contexts such as vehicle formations [1]–[3], aggregation in sensor networks and peer-to-peer networks [4], load balancing in parallel processors [5], [6], and gossip algorithms [7], [8]. Distributed consensus algorithms has been widely investi- gated in networks with static topologies, where it has been shown that the convergence rate depends on the second smallest eigenvalue of the Laplacian of the communication graph [9], [10]. However, the assumption that a network topology is static, i.e. that communication links are fixed and reliable throughout the execution of the algorithm, is not always realistic. In mobile networks, the network topology changes as the agents change position, and therefore the set of nodes with which each node can communicate with may be time-varying. In sensor networks and mobile ad- hoc networks, messages can be lost due to interference, and in wired networks, messages may be dropped due to buffer * This work was funded in part by NSF grant IIS 02-23022. Submitted to IEEE Transactions on Automatic Control. overflow. In scenarios such as these, it is desirable to quantify the effects that topology changes and communication failures have upon the performance of the averaging algorithm. In this work, we consider a network with an underlying topology that is an arbitrary, connected, undirected graph where links fails with independent but not necessarily identical probability. In such stochastic networks, we define conver- gence in terms of the variance of deviation from average. We show that the averaging problem can be formulated as a linear system with multiplicative noise and use our formulation to derive a recursion equation for the second order statistics of the deviation from average. We also give expressions for the mean square convergence rate in the asymptotic limits of small failure probability and large networks. Additionally, we consider the scenario where node values are perturbed by additive noise. This formulation can be used to model load balancing algorithms in peer-to-peer networks or parallel processing systems, where the additive perturbations represent file insertions and deletions or job creations and completions, with the goal of equilibrating the load amongst the participants. A measure of the performance of the averag- ing algorithm in this scenario is not how quickly node values converge to the average, but rather how close the node values remain to each other, and therefore to the average of all values as this average changes over time. This problem has been previously studied in networks without communication failures [10], [11], however we are unaware of any existing work that addresses this problem in networks with communication fail- ures. We show how our formulation for static-valued networks can be extended to incorporate the additive perturbations and give an expression for the steady-state deviation from average. Finally, for both problem formulations, we present simulation- free methods for computing the second order statistics of the variance of the deviation from average, and we use these methods to study the behavior of various network examples as a function of link failure probability. Although there has been work that gives conditions for convergence with communication failures, to our knowledge, this is the first work that quantifies the effects of stochastic communication failures on the performance of the distributed average consensus algorithm. We briefly review some of the related work below.