Coordinating Multiple Agents via Reinforcement Learning Gang Chen 1 , Zhonghua Yang 1 , Hao He 2 , and Kiah Mok Goh 2 1 Information Communication Institute of Singapore, School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798 Contact: eZhYang@ntu.edu.sg 2 Singapore Institute of Manufacturing Technology, 71 Nanyang Drive, Singapore 638075 Abstract In this paper, we focus on the coordination issues in a multiagent setting. Two coordination algorithms based on reinforcement learning are presented and the- oretically analyzed. Our Fuzzy Subjective Task Struc- ture (FSTS) model is described and extended so that the information essential to the agent coordination is eﬀectively and explicitly modeled and incorporated into a general reinforcement learning structure. When compared with other learning based coordination ap- proaches, we argue that due to the explicit model- ing and exploitation of the interdependencies among agents, our approach is more eﬃcient and eﬀective, thus widely applicable. 1 Introduction One of the fundamental issues in autonomous mul- tiagent systems(MAS) is the eﬀective coordination among agents. The agents are required to coordi- nate among themselves in order to meet the system requirements or constraints and thus achieve a com- mon goal. For typical manufacturing applications, the constraints can be stringent (hard constraints), the violation of which will induce the failure of a corre- sponding manufacturing task; whereas some require- ments are soft constraints in a sense that their viola- tion will probably aﬀect the overall performance of the manufacturing system, but not necessarily a failure. There exist many coordination techniques in the liter- ature, including coordination via learning. In this pa- per we present two algorithms based on agent learning which explicitly explore the interdependencies among agents and incorporate them into a general learning structure. For this purpose, we utilize and extend our Fuzzy Subjective Task Structure modeling framework (FSTS)[2]. As argued in the paper, these treatments make our learning algorithms more robust and suitable to real-life applications. Additionally, theoretical anal- ysis are outlined which shows that under proper sys- tem conditions, our algorithms can help agents make near-optimal decisions. 2 FSTS: Modeling Agent Coordina- tion Due to the space limitation, we introduce brieﬂy our Fuzzy Subjective Task Structure Model (FSTS), which has been proposed for modeling general coordi- nation issues in a task-oriented environment. Reader is referred to [2] for a full account. For a task-oriented environment, each task represents a goal to be pur- sued by a group of agents. It is modeled as a set of meta-methods in the FSTS. A meta-method, de- noted as m, is deﬁned as a fuzzy set of domain op- erations which are formally termed as methods. Thus, a meta-method represents the group of methods that satisfy certain criteria (Cr) and are replaceable with each other. The membership degree of any method m with respect to m is called the similarity degree of m, denoted as Similar m (m). It indicates the degree of satisfaction of m with regard to the criteria Cr. In FSTS, the application (for example, manufactur- ing) constraints are modeled in terms of the relations among methods and can be represented as directed graphs. Formally, a directed graph is a tuple of four components g =< V , E ,ǫ,β > where V denotes the set of vertices, E ⊆ V × V represents the set of edges, ǫ : V → L V is the function assigning labels to ver- tices, and β : E → L E is the function assigning labels to edges. There are two types of graphs for hard and soft constraints, respectively. For both types of graphs, ǫ will assign a meta-method to each vertex. We use m u to denote the meta-method assigned to vertex u. In addition, there will be an edge from vertex u to v, u , v ∈ V , if executing the methods belong to m u may aﬀect the execution of the methods belong to m v . For graphs modeling hard constraints, the function β will assign a hard relation to each edge. Two classes