ARTICLE IN PRESS JID: NEUCOM [m5G;May 2, 2019;10:10] Neurocomputing xxx (xxxx) xxx Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Coordinated behavior of cooperative agents using deep reinforcement learning Elhadji Amadou Oury Diallo , Ayumi Sugiyama , Toshiharu Sugawara Department of Computer Science and Communications Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku, Tokyo 169-8555, Japan a r t i c l e i n f o Article history: Received 23 March 2018 Revised 16 July 2018 Accepted 17 August 2018 Available online xxx Keywords: Deep reinforcement learning Multi-agent systems Cooperation Coordination a b s t r a c t In this work, we focus on an environment where multiple agents with complementary capabilities co- operate to generate non-conflicting joint actions that achieve a specific target. The central problem ad- dressed is how several agents can collectively learn to coordinate their actions such that they complete a given task together without conflicts. However, sequential decision-making under uncertainty is one of the most challenging issues for intelligent cooperative systems. To address this, we propose a multi-agent concurrent framework where agents learn coordinated behaviors in order to divide their areas of respon- sibility. The proposed framework is an extension of some recent deep reinforcement learning algorithms such as DQN, double DQN, and dueling network architectures. Then, we investigate how the learned be- haviors change according to the dynamics of the environment, reward scheme, and network structures. Next, we show how agents behave and choose their actions such that the resulting joint actions are op- timal. We finally show that our method can lead to stable solutions in our specific environment. © 2019 Elsevier B.V. All rights reserved. 1. Introduction Multi-agent systems [1–4] are very important in the sense that many systems can be modeled to cope with the limitations of the processing power of a single agent. These systems can profit from many advantages of distributed systems such as robustness, paral- lelism, and scalability [5]. Many real-world systems are achieved by collective and cooperative effort. The need for collaboration between agents that are intelligent computational entities having specialized functionality becomes even more evident when look- ing at examples such as traffic control [6–8], task allocation [9–12], ant colonies [13,14], time-varying formation control [15,16], and bi- ological cells [17]. Because of their wide applicability, multi-agent systems (MASs) arise in a variety of domains including robotics [18], distributed control [19], telecommunications [20–22], and economics [23,24]. The common pattern among all of the aforementioned exam- ples is that the system consists of many agents that wish to col- lectively reach a certain global goal or individual goals. While these agents can often communicate with each other by various means, such as observing each protagonist and exchanging messages, de- cision making in an intelligent MAS is challenging because the Corresponding author. E-mail address: diallo.oury@fuji.waseda.jp (E.A.O. Diallo). appropriate behavior of one agent is inevitably influenced by the behaviors of others, which are often uncertain and not observable. In short, the goal can only be reached if most of the agents work together, while self-interested agents should be prevented from ru- ining the global task for the rest. The main questions related to multi-agent learning are “How much cooperation is required and can be achieved by the agents?” and “How can agents learn which action they should perform under given circumstances?” On one end, we have independent learners trying to optimize their own behavior without any form of communication with the other protagonists, and they only use the feedback received from the environment. On the other end, we have joint learners where every agent reports every step they take to every other agent before proceeding to the next step. Multi-agent learning [25–27] is a key technique in distributed artificial intelligence. As expected, computer scientists have been working on extending reinforcement learning (RL) [28] to multi- agent systems to identify appropriate behavior in complex systems. Markov games [29] have been widely recognized as the prevalent model of multi-agent reinforcement learning (MARL). In general, we have two types of learning in multi-agent systems: centralized (the learning is done by an independent agent on its own) and distributed collective learning (learning is done by the agents as a group). Modeling multi-agent systems is a complex task due to the environmental dynamics, the action, and state spaces as well as the type of agents. In fact, many real-world domains have very https://doi.org/10.1016/j.neucom.2018.08.094 0925-2312/© 2019 Elsevier B.V. All rights reserved. Please cite this article as: E.A.O. Diallo, A. Sugiyama and T. Sugawara, Coordinated behavior of cooperative agents using deep reinforce- ment learning, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.08.094