On the Rate of Convergence of Distributed Subgradient Methods for Multi-agent Optimization Angelia Nedi´ c Department of Industrial and Enterprise Systems Engineering University of Illinois Urbana-Champaign, IL 61801 Email: angelia@uiuc.edu Asuman Ozdaglar Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 02142 Email: asuman@mit.edu Abstract— We study a distributed computation model for optimizing the sum of convex (nonsmooth) objective functions of multiple agents. We provide convergence results and estimates for convergence rate. Our analysis explicitly characterizes the tradeoff between the accuracy of the approximate optimal solu- tions generated and the number of iterations needed to achieve the given accuracy. I. I NTRODUCTION There has been considerable recent interest in the analysis of large-scale networks, such as the Internet, which consist of multiple agents with different objectives. For such networks, it is essential to design network control methods that can operate in a decentralized manner with limited local information and converge to an approximately optimal operating point fairly rapidly. Recent literature has adopted the utility-maximization framework of economics to design distributed algorithms that captures the multiple objectives of different agents, represented by different utility functions (see Kelly et al. [7], Low and Lapsley [8], and Srikant [13]). This framework builds on convex optimization duality and is limited to applications where the utility function of an agent depends only on the resource allocated to that agent. In many applications how- ever, individual agent’s utility depends on the entire resource allocation vector. In this paper, we study a simple distributed computation model that captures these interactions and provide convergence rate analysis for the resulting algorithm. In particular, we consider a network consisting of V = {1,...,m} nodes (or agents) that cooperatively minimize a common additive cost. More formally, the agents want to cooperatively solve the following unconstrained optimization problem: minimize m i=1 f i (x) subject to x R n , (1) where f i : R n R is a convex function for all i. Let f (x)= m i=1 f i (x). We denote the optimal value of this problem by f . We also denote the optimal solution set by X . We assume that each agent i has information only about his/her cost function f i . Every agent generates and main- tains estimates of the optimal solution to problem (1), and communicates them directly or indirectly to the other agents. Each agent updates his/her estimate by combining it with the estimates received from the other agents (if any) and by using the subgradient information of f i . Our model is in the spirit of the distributed computation model proposed by Tsitsiklis [14] (see also Tsitsiklis et al. [15], Bertsekas and Tsitsiklis [3]). There, the main focus is on minimizing a (smooth) function f (x)= m i=1 f i (x) by distributing the vector components x j ,j =1,...,n among n processors. The possibility of distributing the component functions f i among m agents has been suggested in [14], but has not been explored fully. Here, we pursue this idea in depth with focus on the case where the component functions f i are convex but not necessarily smooth. To our knowledge this is the first distributed model of such kind that is rigorously analyzed. For this model, we present convergence results and estimates for the rate of convergence. In particular, we show that there is a tradeoff between the quality of an approximate optimal solution and the computation load required to generate such a solution. Our convergence rate estimate captures this dependence explicitly in terms of the system and algorithm parameters. Related computational models for reaching consensus on a particular scalar value have attracted a lot of recent attention as natural models of cooperative behavior in networked-systems (see Vicsek et al. [16] and Jadbabaie et al. [6]). There is also another line of related work that focuses on computing exact averages of the initial values of the agents (see Boyd et al. [5] and Kashyap et al. [1]). The remainder of this paper is organized as follows: in Section 2, we describe a distributed computation model and the assumptions on the interaction rules among the agents. In Section 3, we present our assumptions and preliminary results. In Section 4, we provide our main convergence and rate of convergence results. Finally, in Section 5, we present our concluding remarks. Notation and Basic Notions For a matrix A, we write A j i or [A] j i to denote the matrix entry in the i-th row and j -th column. We write [A] i to denote the i-th row of the matrix A, and [A] j to denote the j -th column of A. A vector a R m is said to be a stochastic vector when its components a i , i =1,...,m, are nonnegative and m i=1 a i =1. A square m × m matrix