Federated Deep Reinforcement Learning Hankz Hankui Zhuo 1 Wenfeng Feng 1 Yufeng Lin 1 Qian Xu 2 Qiang Yang 2 Abstract In deep reinforcement learning, building policies of high-quality is challenging when the feature space of states is small and the training data is lim- ited. Despite the success of previous transfer learn- ing approaches in deep reinforcement learning, directly transferring data or models from an agent to another agent is often not allowed due to the pri- vacy of data and/or models in many privacy-aware applications. In this paper, we propose a novel deep reinforcement learning framework to federa- tively build models of high-quality for agents with consideration of their privacies, namely Federated deep Reinforcement Learning (FedRL). To pro- tect the privacy of data and models, we exploit Gausian differentials on the information shared with each other when updating their local mod- els. In the experiment, we evaluate our FedRL framework in two diverse domains, Grid-world and Text2Action domains, by comparing to vari- ous baselines. 1. Introduction In deep reinforcement learning, building policies of high- quality is challenging when the feature space of states is small and the training data is limited. In many real-world applications, however, datasets from clients are often privacy sensitive (Duchi et al., 2012) and it is often difficult for such a data center to guarantee building models of high- quality. To deal with the issue, Konecny et al. propose a new learning setting, namely federated learning, whose goal is to train a classification or clustering model with training data involving texts, images or videos distributed over a large number of clients (Konecn ´ y et al., 2016; McMahan et al., 2017). Different from previous federated learning setting (c.f. (Yang et al., 2019)), we propose a novel federated 1 zhuohank@mail.sysu.edu.cn, fengwf2014@outlook.com, linyf23@mail2.sysu.edu.cn, Sun Yat-Sen University, Guangzhou, China; 2 {qianxu,qiangyang}@webank.com, WeBank, Shenzhen, China. Correspondence to: Hankz Hankui Zhuo <zhuo- hank@mail.sysu.edu.cn>. learning framework based on reinforcement learning (Sutton & Barto, 1998; Mnih et al., 2015; Co-Reyes et al., 2018), i.e., Federated deep Reinforcement Learning (FedRL), which aims to learn a private Q-network policy for each agent by sharing limited information (i.e., output of the Q-network) among agents. The information is “encoded” when it is sent to others and “decoded” when it is received by others. We assume that some agents have rewards corresponding to states and actions, while others have only observed states without rewards. Without rewards, those agents are unable to build decision policies on their own information. We claim that all agents benefit from joining the federation in building decision policies. There are many applications regarding federated reinforce- ment learning. For example, in the manufacturing industry, producing products may involve various factories which produce different components of the products. Factories’ decision policies are private and will not be shared with each other. On the other hand, building individual decision policies of high-quality on their own is often difficult due to their limited businesses and lack of rewards (for some factories). It is thus helpful for them to learn decision po- lices federatively under the condition that private data is not given away. Another example is building medical treatment policies to patients for hospitals. Patients may be treated in some hospitals and never give feedbacks to the treatments, which indicates these hospitals are unable to collect rewards based on the treatments given to the patients and build treat- ment decision policies for patients. In addition, data records about patients are private and may not be shared among hospitals. It is thus necessitated to learn treatment policies for hospitals federatively. Our FedRL framework is different from multi-agent rein- forcement learning, which is concerned with a set of au- tonomous agents that observe global states (or partial states which are directly shared to make “global” states), select an individual action and receive a team reward (or each agent receives an individual reward but shares it with other agents) (Tampuu et al., 2015; Leibo et al., 2017; Foerster et al., 2016). FedRL assumes agents do not share their par- tial observations and some agents are unable to receive re- wards. Our FedRL framework is also different from transfer learning in reinforcement learning, which aims to transfer experience gained in learning to perform one task to help arXiv:1901.08277v3 [cs.LG] 9 Feb 2020