Genetic Programming with External Memory in Sequence Recall Tasks Mihyar Al Masalma Dalhousie University Halifax, NS, Canada m.almasalma@dal.ca Malcolm I. Heywood Dalhousie University Halifax, NS, Canada mheywood@cs.dal.ca ABSTRACT Partially observable tasks imply that a learning agent has to recall previous state in order to make a decision in the present. Recent research with neural networks have investigated both internal and external memory mechanisms for this purpose, as well as proposing benchmarks to measure their efectiveness. These developments motivate our investigation using genetic programming and an ex- ternal linked list memory model. A thorough empirical evaluation using a scalable sequence recall benchmark establishes the under- lying strength of the approach. In addition, we assess the impact of decisions made regarding the instruction set and characterize the sensitivity to noise / obfuscation in the defnition of the benchmarks. Compared to neural solutions to these benchmarks, GP extends the state-of-the-art to greater task depths than previously possible. CCS CONCEPTS · Computing methodologies Genetic programming; Se- quential decision making. KEYWORDS modularity, external memory, partially observable ACM Reference Format: Mihyar Al Masalma and Malcolm I. Heywood . 2022. Genetic Programming with External Memory in Sequence Recall Tasks. In Genetic and Evolutionary Computation Conference Companion (GECCO ’22 Companion), July 9–13, 2022, Boston, MA, USA. ACM, New York, NY, USA, 4 pages. https://doi.org/ 10.1145/3520304.3528883 1 INTRODUCTION Most learning agents are purely reactive, which is to say that their output is a function of the current input alone. This is sufcient for supervised learning tasks such as regression and classifcation or unsupervised learning tasks such as clustering. However, more general cognitive tasks, as encountered under partially observable state, 1 imply that an agent has to interact with the environment and recall events from the past in order to make decisions in the present. With this in mind, there has been something of a resurgence of interest in agents that support memory, particularly with respect 1 For example, as often experienced in robotics, planning and process control. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). GECCO ’22 Companion, July 9–13, 2022, Boston, MA, USA © 2022 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-9268-6/22/07. https://doi.org/10.1145/3520304.3528883 to neural networks (e.g. [5ś7, 10]). One motivation for this is that although recurrent neural networks are Turing Complete [11], this does not mean that fnding the recurrent connectivity appropriate for solving a partially observable task is straightforward. Similar observations have motivated the use of memory with genetic programming (GP). Thus, adding indexed memory to tree structured GP also supports Turing Completeness [13], but does not necessarily result in the efcient development of internal state rep- resentations [1, 2]. Indeed, Langdon [8] in particular demonstrated that a prior decomposition of the memory interface (relative to an external data structure) can be benefcial when evolving solutions to partially observable tasks, i.e. signals controlling memory are associated with diferent programs. In this work, we are interested in revisiting the use of coevolved modular GP controllers for external memory in partially observ- able tasks. Particular attention is given to the formulation of a list data structure and a coevolutionary modular approach for combin- ing the diferent GP memory controllers into a cohesive solution. A benchmarking study is then performed over a set of scalable sequence recall tasks as recently employed for the purpose of as- sessing the efciency of neural memory models [7, 10]. We are able to demonstrate general solutions to the sequence recall benchmarks and also illustrate the role that the instruction set plays in biasing the quality of solutions provided. The balance of the paper begins by introducing the external memory model and formulation adopted for GP (ğ2). Specifcally, we assume canonical Tree structured GP as implemented in DEAP [3] and emphasize how GP interfaces to a list data structure. Section 3 characterizes the scalable sequence recall benchmark as previously proposed to assess the efectiveness of memory mechanisms in neural networks [7, 10]. Section 4 presents the benchmarking study while conclusions are drawn in Section 5. 2 EXTERNAL MEMORY MODEL AND GP FORMULATION We assume that GP will have the following general form: Canonical (tree structured) GP with an instruction set composed from arithmetic and / or logical operations as im- plemented in open source code distributions (DEAP assumed in this work [3]). This also implies that selection, variation and replacement operations are also generic. Naturally, this implies that GP is purely reactive, i.e has no capacity for recurrent behaviours itself. External memory provides the mechanism for recalling previous state(s) and will be modelled as a list data structure. GP will therefore have to learn how to apply the list to solve memory tasks by choosing between one of A commands at 518