Language Support for Multi Agent Reinforcement Learning Tony Clark Aston University Birmingham, UK tony.clark@aston.ac.uk Balbir Barn Middlesex University London, UK b.barn@mdx.ac.uk Vinay Kulkarni, Souvik Barat TCS Research Pune, India vinay.vkulkarni@tcs.com souvik.barat@tcs.com ABSTRACT Software Engineering must increasingly address the issues of com- plexity and uncertainty that arise when systems are to be deployed into a dynamic software ecosystem. There is also interest in using digital twins of systems in order to design, adapt and control them when faced with such issues. The use of multi-agent systems in combination with reinforcement learning is an approach that will allow software to intelligently adapt to respond to changes in the environment. This paper proposes a language extension that en- capsulates learning-based agents and system building operations and shows how it is implemented in ESL. The paper includes exam- ples the key features and describes the application of agent-based learning implemented in ESL applied to a real-world supply chain. CCS CONCEPTS · Software and its engineering Multiparadigm languages. KEYWORDS Agents, Reinforcement Learning ACM Reference Format: Tony Clark, Balbir Barn, and Vinay Kulkarni, Souvik Barat. 2020. Language Support for Multi Agent Reinforcement Learning. In 13th Innovations in Software Engineering Conference (formerly known as India Software Engi- neering Conference) (ISEC 2020), February 27ś29, 2020, Jabalpur, India. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3385032.3385041 1 INTRODUCTION The current era of digitisation, enabled through the technologies underpinning the so-called Fourth Industrial Revolution such as ubiquity of sensors, big data, artifcial intelligence and low latency telecommunications [29], has led to increasing design complexity, and result in systems that need to be deployed into an uncertain en- vironment. Such complex systems, for example, production plants, logistics networks, IT service companies, and international fnancial companies, are complex systems of systems that operate in highly dynamic environments that require rapid response to change. The characteristic features of such systems include scale, complex in- teractions, knowledge of behaviour limited to localised contexts, and inherent uncertainty. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specifc permission and/or a fee. Request permissions from permissions@acm.org. ISEC 2020, February 27ś29, 2020, Jabalpur, India © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-7594-8/20/02. . . $15.00 https://doi.org/10.1145/3385032.3385041 Our hypothesis is that these issues can be addressed through the use of multi-agent reinforcement learning (MARL) [13] which has traditionally been used in simulation to design controllers for robots, resource management and automated trading systems. Our pro- posal is that MARL is potentially a much more fundamental concept that can be used to address intra and inter system-complexity by allowing general systems to achieve an optimal behaviour through dynamic learning. Similarly, design problems arising from a de- sire to achieve an algorithmically deterministic solution can be addressed by leaving non-determinism in the run-time. The use of MARL to develop and deploy general purpose systems raises many problems. The design of such systems requires agent goals to be expressed efectively and in such a way that they do not confict. In any non-trivial system, agents need to interact and col- laborate to achieve shared goals which is an area of active research [18]. A MARL-based system also needs to address composition as a frst class function. A mechanism by which collaborating agents, each with their own learning capabilities can be combined in an optimised way is an essential requirement for a complex system of systems. Language support for MARL-based system development is not widespread, for example AgentSpeak provides support for agents, but not reinforcement learning [4]. Recent research has been suc- cessful in integrating reinforcement learning and agent oriented programming [5], but without introducing a new language con- struct with the potential for static analyis. Typically reinforcement learning libraries provide dynamic APIs that limit the potential for tool supported verifcation and analysis of system defnitions [32]. Semantic constructs have been proposed for agent-based rein- forcement learning [3] without the associated language integration proposed in this paper. Our proposal is the same as that described by Simpkins et al [31] which is that programming languages, and thereby Software Engineering, will beneft by using reinforcement learning in order to become more adaptive. Our work goes fur- ther than their A 2 BL language by showing how a strongly typed language can be extended with both an agent construct with inher- itance and associated system building and transformation opera- tions. This paper provides a contribution to the issue of language sup- port for MARL. Section 2 describes the motivation for our approach in more detail and establishes the problem to be addressed. It also describes approaches to MARL provided by other technology plat- forms, establishing that our approach is novel and that, if efec- tive, it provides a contribution to system development. Section 3 establishes the requirements on a language-based approach and describes an extension to the language ESL that has been developed to support MARL. Section 4 shows how a supply chain can be imple- mented using the language feature together with a demonstration