Learning a Super Mario Controller from Examples of Human Play
Geoffrey Lee
School of Computer Science
and Information Technology
RMIT University, Australia
geoff.lee@rmit.edu.au
Min Luo
School of Computer Science
and Information Technology
RMIT University, Australia
s3195794@student.rmit.edu.au
Fabio Zambetta
School of Computer Science
and Information Technology
RMIT University, Australia
fabio.zambetta@rmit.edu.au
Xiaodong Li
School of Computer Science
and Information Technology
RMIT University, Australia
xiaodong.li@rmit.edu.au
Abstract—Imitating human-like behaviour in action games
is a challenging but intriguing task in Artificial Intelligence
research, with various strategies being employed to solve the
human-like imitation problem. In this research we consider
learning human-like behaviour via Markov decision processes
without being explicitly given a reward function, and learning to
perform the task by observing expert’s demonstration. Individual
players often have characteristic styles when playing the game,
and this method attempts to find the behaviours which make
them unique. During play sessions of Super Mario we calculate
player’s behaviour policies and reward functions by applying
inverse reinforcement learning to the player’s actions in game.
We conduct an online questionnaire which displays two video
clips, where one is played by a human expert and the other is
played by the designed controller based on the player’s policy.
We demonstrate that by using apprenticeship learning via Inverse
Reinforcement Learning, we are able to get an optimal policy
which yields performance close to that of an human expert
playing the game, at least under specific conditions.
I. I NTRODUCTION
The game industry has been rapidly expanding for the past
few decades and it is the fastest-growing component of the
international media sector. It has been devoting considerable
resources to design highly sophisticated graphical content and
challenging and believable Artificial Intelligence (AI). Various
artificial intelligence methods have been employed in modern
video games to engage players longer, game agents built
with human-like behaviour and cooperation, which raise the
players’ emotional involvement and increase immersion in the
game simulation.
To better develop human-like behaviour in game agents,
an AI technique called imitation learning has been developed
which allows for the AI to learn from observation. It was orig-
inally applied with success to robot manufacturing processes
[1]. Preliminary work on imitation learning has been focused
on the task of motion planning for artificial opponent in first-
person shooter games [2], but modelling game AI through
imitation learning is seen to have great potential for more
games than just first-person-shooters.
A particularly fitting challenge for imitation learning has
been posed by the Super Mario Turing Test AI competition [3]
[4] whose goal is to develop an artificial controller that plays
Super Mario in a human-like fashion. With this challenge in
mind, this paper presents work on realising a controller for the
game Super Mario by applying Apprenticeship Learning via
Inverse Reinforcement Learning (IRL) [5], a high-performance
method of imitation learning.
Imitating player behaviour has several key benefits: We can
create games with more intelligent and believable NPCs
1
, and
opponents that do not react to players in a pre-determined fash-
ion, regardless of context [6]. Game play can be dynamically
altered to adapt to different players by their features of play
(their playing “style” as well as their “skill”) to sustain their
engagement with the game longer [7], and learned AI agents
are able to help game companies test the strength of game
AI and discover defects or limitations before its release to the
market [8].
In order to implement and test the application of IRL
to a Super Mario controller, this paper investigates how we
can design an appropriate knowledge representation for the
Super Mario testbed environment. We investigate the MDP
framework and inverse reinforcement learning algorithms that
allows us to learn a controller via imitation learning. We
also address objective methods to evaluate our controller’s
performance, and how to best evaluate if our controller’s
behaviour is human-like.
Our contributions include providing a solution to represent
knowledge in Super Mario game within an MDP framework,
on which we applied apprenticeship learning via IRL. We show
two experiments using self-reporting that forms the basis of a
“Super Mario Turing Test”, and we provide an experimental
analysis of why AL/IRL holds promise for providing human-
like controllers and eventually passing this modified Turing
test.
The rest of this paper is organized as follows: Section
II describes the basic techniques and algorithms which serve
as foundations, and brief current state of the art in human-
behaviour modelling. Section III is our proposed framework
used to generate our experimental results. In Section IV we
discuss experimental results, analysing the convergence results
for apprenticeship learning and present and discussing quanti-
tative results of questionnaires. Finally, Section V summarises
our results and lays out opportunities for future work.
II. BACKGROUND AND RELATED WORK
A. Mario, AI and Turing Tests
Super Mario has been used to study human-like behaviour
since 1990’s. In 1992, John and Vera used GOMS (Goals,
Operations, Methods and Selection rules) to predict the be-
haviour of an expert in Super Mario Bros [9]. By using the only
information in the booklet and some hand coded heuristics,
1
Non-Player Characters
1
2014 IEEE Congress on Evolutionary Computation (CEC)
July 6-11, 2014, Beijing, China
978-1-4799-1488-3/14/$31.00 ©2014 IEEE