Learning a Super Mario Controller from Examples of Human Play Geoffrey Lee School of Computer Science and Information Technology RMIT University, Australia geoff.lee@rmit.edu.au Min Luo School of Computer Science and Information Technology RMIT University, Australia s3195794@student.rmit.edu.au Fabio Zambetta School of Computer Science and Information Technology RMIT University, Australia fabio.zambetta@rmit.edu.au Xiaodong Li School of Computer Science and Information Technology RMIT University, Australia xiaodong.li@rmit.edu.au Abstract—Imitating human-like behaviour in action games is a challenging but intriguing task in Artiﬁcial Intelligence research, with various strategies being employed to solve the human-like imitation problem. In this research we consider learning human-like behaviour via Markov decision processes without being explicitly given a reward function, and learning to perform the task by observing expert’s demonstration. Individual players often have characteristic styles when playing the game, and this method attempts to ﬁnd the behaviours which make them unique. During play sessions of Super Mario we calculate player’s behaviour policies and reward functions by applying inverse reinforcement learning to the player’s actions in game. We conduct an online questionnaire which displays two video clips, where one is played by a human expert and the other is played by the designed controller based on the player’s policy. We demonstrate that by using apprenticeship learning via Inverse Reinforcement Learning, we are able to get an optimal policy which yields performance close to that of an human expert playing the game, at least under speciﬁc conditions. I. I NTRODUCTION The game industry has been rapidly expanding for the past few decades and it is the fastest-growing component of the international media sector. It has been devoting considerable resources to design highly sophisticated graphical content and challenging and believable Artiﬁcial Intelligence (AI). Various artiﬁcial intelligence methods have been employed in modern video games to engage players longer, game agents built with human-like behaviour and cooperation, which raise the players’ emotional involvement and increase immersion in the game simulation. To better develop human-like behaviour in game agents, an AI technique called imitation learning has been developed which allows for the AI to learn from observation. It was orig- inally applied with success to robot manufacturing processes [1]. Preliminary work on imitation learning has been focused on the task of motion planning for artiﬁcial opponent in ﬁrst- person shooter games [2], but modelling game AI through imitation learning is seen to have great potential for more games than just ﬁrst-person-shooters. A particularly ﬁtting challenge for imitation learning has been posed by the Super Mario Turing Test AI competition [3] [4] whose goal is to develop an artiﬁcial controller that plays Super Mario in a human-like fashion. With this challenge in mind, this paper presents work on realising a controller for the game Super Mario by applying Apprenticeship Learning via Inverse Reinforcement Learning (IRL) [5], a high-performance method of imitation learning. Imitating player behaviour has several key beneﬁts: We can create games with more intelligent and believable NPCs 1 , and opponents that do not react to players in a pre-determined fash- ion, regardless of context [6]. Game play can be dynamically altered to adapt to different players by their features of play (their playing “style” as well as their “skill”) to sustain their engagement with the game longer [7], and learned AI agents are able to help game companies test the strength of game AI and discover defects or limitations before its release to the market [8]. In order to implement and test the application of IRL to a Super Mario controller, this paper investigates how we can design an appropriate knowledge representation for the Super Mario testbed environment. We investigate the MDP framework and inverse reinforcement learning algorithms that allows us to learn a controller via imitation learning. We also address objective methods to evaluate our controller’s performance, and how to best evaluate if our controller’s behaviour is human-like. Our contributions include providing a solution to represent knowledge in Super Mario game within an MDP framework, on which we applied apprenticeship learning via IRL. We show two experiments using self-reporting that forms the basis of a “Super Mario Turing Test”, and we provide an experimental analysis of why AL/IRL holds promise for providing human- like controllers and eventually passing this modiﬁed Turing test. The rest of this paper is organized as follows: Section II describes the basic techniques and algorithms which serve as foundations, and brief current state of the art in human- behaviour modelling. Section III is our proposed framework used to generate our experimental results. In Section IV we discuss experimental results, analysing the convergence results for apprenticeship learning and present and discussing quanti- tative results of questionnaires. Finally, Section V summarises our results and lays out opportunities for future work. II. BACKGROUND AND RELATED WORK A. Mario, AI and Turing Tests Super Mario has been used to study human-like behaviour since 1990’s. In 1992, John and Vera used GOMS (Goals, Operations, Methods and Selection rules) to predict the be- haviour of an expert in Super Mario Bros [9]. By using the only information in the booklet and some hand coded heuristics, 1 Non-Player Characters 1 2014 IEEE Congress on Evolutionary Computation (CEC) July 6-11, 2014, Beijing, China 978-1-4799-1488-3/14/$31.00 ©2014 IEEE