Language Support for Multi Agent Reinforcement Learning
Tony Clark
Aston University
Birmingham, UK
tony.clark@aston.ac.uk
Balbir Barn
Middlesex University
London, UK
b.barn@mdx.ac.uk
Vinay Kulkarni, Souvik Barat
TCS Research
Pune, India
vinay.vkulkarni@tcs.com
souvik.barat@tcs.com
ABSTRACT
Software Engineering must increasingly address the issues of com-
plexity and uncertainty that arise when systems are to be deployed
into a dynamic software ecosystem. There is also interest in using
digital twins of systems in order to design, adapt and control them
when faced with such issues. The use of multi-agent systems in
combination with reinforcement learning is an approach that will
allow software to intelligently adapt to respond to changes in the
environment. This paper proposes a language extension that en-
capsulates learning-based agents and system building operations
and shows how it is implemented in ESL. The paper includes exam-
ples the key features and describes the application of agent-based
learning implemented in ESL applied to a real-world supply chain.
CCS CONCEPTS
· Software and its engineering → Multiparadigm languages.
KEYWORDS
Agents, Reinforcement Learning
ACM Reference Format:
Tony Clark, Balbir Barn, and Vinay Kulkarni, Souvik Barat. 2020. Language
Support for Multi Agent Reinforcement Learning. In 13th Innovations in
Software Engineering Conference (formerly known as India Software Engi-
neering Conference) (ISEC 2020), February 27ś29, 2020, Jabalpur, India. ACM,
New York, NY, USA, 11 pages. https://doi.org/10.1145/3385032.3385041
1 INTRODUCTION
The current era of digitisation, enabled through the technologies
underpinning the so-called Fourth Industrial Revolution such as
ubiquity of sensors, big data, artifcial intelligence and low latency
telecommunications [29], has led to increasing design complexity,
and result in systems that need to be deployed into an uncertain en-
vironment. Such complex systems, for example, production plants,
logistics networks, IT service companies, and international fnancial
companies, are complex systems of systems that operate in highly
dynamic environments that require rapid response to change. The
characteristic features of such systems include scale, complex in-
teractions, knowledge of behaviour limited to localised contexts,
and inherent uncertainty.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specifc permission and/or a
fee. Request permissions from permissions@acm.org.
ISEC 2020, February 27ś29, 2020, Jabalpur, India
© 2020 Association for Computing Machinery.
ACM ISBN 978-1-4503-7594-8/20/02. . . $15.00
https://doi.org/10.1145/3385032.3385041
Our hypothesis is that these issues can be addressed through the
use of multi-agent reinforcement learning (MARL) [13] which has
traditionally been used in simulation to design controllers for robots,
resource management and automated trading systems. Our pro-
posal is that MARL is potentially a much more fundamental concept
that can be used to address intra and inter system-complexity by
allowing general systems to achieve an optimal behaviour through
dynamic learning. Similarly, design problems arising from a de-
sire to achieve an algorithmically deterministic solution can be
addressed by leaving non-determinism in the run-time.
The use of MARL to develop and deploy general purpose systems
raises many problems. The design of such systems requires agent
goals to be expressed efectively and in such a way that they do not
confict. In any non-trivial system, agents need to interact and col-
laborate to achieve shared goals which is an area of active research
[18]. A MARL-based system also needs to address composition as a
frst class function. A mechanism by which collaborating agents,
each with their own learning capabilities can be combined in an
optimised way is an essential requirement for a complex system of
systems.
Language support for MARL-based system development is not
widespread, for example AgentSpeak provides support for agents,
but not reinforcement learning [4]. Recent research has been suc-
cessful in integrating reinforcement learning and agent oriented
programming [5], but without introducing a new language con-
struct with the potential for static analyis. Typically reinforcement
learning libraries provide dynamic APIs that limit the potential
for tool supported verifcation and analysis of system defnitions
[32]. Semantic constructs have been proposed for agent-based rein-
forcement learning [3] without the associated language integration
proposed in this paper. Our proposal is the same as that described
by Simpkins et al [31] which is that programming languages, and
thereby Software Engineering, will beneft by using reinforcement
learning in order to become more adaptive. Our work goes fur-
ther than their A
2
BL language by showing how a strongly typed
language can be extended with both an agent construct with inher-
itance and associated system building and transformation opera-
tions.
This paper provides a contribution to the issue of language sup-
port for MARL. Section 2 describes the motivation for our approach
in more detail and establishes the problem to be addressed. It also
describes approaches to MARL provided by other technology plat-
forms, establishing that our approach is novel and that, if efec-
tive, it provides a contribution to system development. Section 3
establishes the requirements on a language-based approach and
describes an extension to the language ESL that has been developed
to support MARL. Section 4 shows how a supply chain can be imple-
mented using the language feature together with a demonstration