Robust Federated Learning Using ADMM in the Presence of Data Falsifying Byzantines Qunwei Li * Bhavya Kailkhura † Ryan Goldhahn † Priyadip Ray † Pramod K. Varshney * * Syracuse University † Lawrence Livermore National Laboratory Abstract In this paper, we consider the problem of federated (or decentralized) learning using ADMM with multiple agents. We consider a scenario where a certain fraction of agents (referred to as Byzantines) provide falsified data to the system. In this context, we study the convergence behavior of the decentralized ADMM algorithm. We show that ADMM converges linearly to a neighborhood of the solution to the problem under certain con- ditions. We next provide guidelines for net- work structure design to achieve faster con- vergence. Next, we provide necessary condi- tions on the falsified updates for exact con- vergence to the true solution. To tackle the data falsification problem, we propose a ro- bust variant of ADMM. We also provide sim- ulation results to validate the analysis and show the resilience of the proposed algorithm to Byzantines. 1 Introduction Many machine learning and statistics problems fit into the general framework where a finite-sum structure of functions is to be optimized. In general, the problem is formulated as min x∈R N f (x), f (x)= L i=1 f i (x). (1) The problem structure in (1) covers collaborative au- tonomous inference in statistics and linear/logistic re- gression, support vector machines, and deep neural networks in machine learning. Due to the emergence of Preliminary work. the big data era and associated sizes of datasets, solv- ing problem (1) on a single node (or agent) is often impossible, as storing the entire dataset on a single node becomes infeasible. This gives rise to the fed- erated optimization setting [9], in which the training data for the problem is stored in a distributed fashion across a number of interconnected nodes and the opti- mization problem is solved collectively by the cluster of nodes. However, distributing computation over sev- eral nodes induces a higher risk of failures, including communication noise, crashes and computation errors. Furthermore, some nodes, often referred to as Byzan- tine nodes, may intentionally inject false data to gain unfair advantage or degrade the system performance. While Byzantines (originally proposed in [10]) may, in general, refer to many types of unwanted behavior, our focus in this paper is on data-falsification. Data fal- sifying Byzantines can easily prevent the convergence of the federated learning algorithm [8, 9]. There exist several decentralized optimization meth- ods for solving (1), including belief propagation [15], distributed subgradient descent algorithms [13], dual averaging methods [4], and the alternating direction method of multipliers (ADMM) [2]. Among these, ADMM has drawn significant attention, as it is well suited for distributed optimization and demonstrates fast convergence in many applications [19, 17]. More specifically, ADMM was found to converge linearly for a large class of problems [7]. In [16], linear conver- gence rate has also been established for decentralized ADMM. Recently, the performance analysis of ADMM in the presence of inexactness in the updates has re- ceived some attention [1, 20, 18, 14, 6, 5]. Most rel- evant to our work, [3] studies the inexact ADMM al- gorithm for the decentralized consensus problem. The authors in [11] tried to study the scenario where the error e k occurs in the ADMM update x k . They consid- ered the occurrence of the error in the x-update step in (7), but failed to consider it in the α-update step. However, most of the aforementioned papers consider that the inexactness occurs in an intermediate step of proximal mapping in one ADMM iteration, which is arXiv:1710.05241v1 [cs.LG] 14 Oct 2017