Training a Reinforcement Learning Agent based on XCS in a Competitive Snake Environment Johannes B¨ uttner Games Engineering University of W¨ urzburg urzburg, Germany johannes.buettner@uni-wuerzburg.de Sebastian von Mammen Games Engineering University of W¨ urzburg urzburg, Germany sebastian.von.mammen@uni-wuerzburg.de Abstract—In contrast to neural networks, learning classifier systems are no “black box” algorithm. They provide rule-based artificial intelligence, which can easily be analysed, interpreted and even adapted by humans. We constructed an agent based on such a learning classifier system and trained it to play in a competitive snake environment by utilizing reinforcement learning and self-play methods. Our preliminary experiments show promising results that we plan to extend on in the future. Index Terms—Extended Classifier System, Game-playing AI, Reinforcement Learning, Snake I. I NTRODUCTION In recent years, game-playing artificial intelligence (AI) has in many games achieved super-human levels of competency, that were previously impossible. Deepmind’s AlphaZero [1] uses a deep neural network to learn a policy used in Monte Carlo tree search and was able to beat a world-champion program for the games of chess, shogi, and Go each. To achieve this, it was trained solely by reinforcement learning (RL) with self-play. While the feats achieved by artificial neural networks used in deep reinforcement learning have been impressive, there are multiple problems still to be solved to improve our understanding and the efficiency of AI. An inherent problem of artificial neural networks lies in them being a “black box”, as even their designers cannot explain why the AI has come to a specific decision. The complex nature of deep neural networks has even led to a whole research field about the question why an AI made a specific decision, called “explainable AI” [2]. Other machine learning methods do not have these issues but are designed as transparent and interpretable algorithms. Alas, the research regarding them has been miniscule compared to deep neural networks. Another problem of the current state-of- the-art RL algorithms lies in the time consuming process. Even though the final results are exceptional, the initial version of the AI has no knowledge about the rules of a given game and takes very long to even get to an amateur’s competency. For example, OpenAI Five was trained for 180 days, where each day consists of 180 years worth of games, before competing against and beating human players [3], [4]. By providing AI with knowledge about the game and basic correct behavior, the learning process could be kickstarted and the time to reach exceptional levels drastically reduced. While the name might be a rather confusing term nowadays, a “learning classifier sys- tem” is a rule-based machine learning algorithm, that has been originally proposed in 1978 [5]. The rules used by a learning classifier system are of a simple IF THEN form and can easily be analysed by humans. Furthermore, these rules cannot only be read but also be written by humans. This way, learning classifier systems can be given rules that were crafted by domain experts, even before the training process started. Thus, the initial part of learning basic rules and strategies could possibly be reduced significantly. Therefore, our research aims to improve and we focus on learning classifier systems, which provide a rule-based, human-understandable machine learning framework that can be used for both supervised and reinforcement learning. In this paper we present our approach on using reinforce- ment learning with self-play on an agent utilizing a learning classifier system for a competitive snake game. The remainder of this paper is structured as follows. In Sec- tion II we provide relevant works regarding typical approaches to AI in snake games, the extended classifier system (XCS) and an improved variant of it, and previous usages of the XCS in games. In Section III we describe the learning environment and the reinforcement learning approaches we used. In Section IV we present experiments we conducted to evaluate different configurations of our system. We present the corresponding results in Section V. Finally, we conclude with a summary of the provided work and possible future work in Section VI. II. RELATED WORK In this section we first describe the XCS framework. Second, a variant of the XCS using inductive reasoning is presented. Finally, we present relevant works using an XCS-based AI in games. A. Extended Classifier System The XCS [6] is built to receive input about an environment’s state through detectors and return an action to change the environment’s state (see Fig. 1). This action is then rewarded, which updates the inner mechanisms of the XCS. Original implementations of the XCS used binary strings as input. As real-world problems often need to be represented by real