Human-like Bots for Unreal Tournament 2004 A Q-learning Approach to Refine UT2004 Strategies Anshul Goyal Philippe Pasquier School of Interactive Arts and Technology School of Interactive Arts and Technology Simon Fraser University, Surrey, BC Simon Fraser University, Surrey, BC anshulg@sfu.ca pasquier@sfu.ca ABSTRACT Defining human-like behavior for bots or non-player characters (NPC) in a real-time game such as Unreal Tournament 2004 (UT2004) has always been a challenging task. Dynamic game environments challenge the bot to behave autonomously without any human supervision or intervention. We highlight how we can use Behavior Trees to define agent‟s behavior and use Reinforcement Learning (RL) technique to teach the bot how to choose actions so as to maximize numerical reward. The other motivation behind creating such an agent that can behave like a human is to participate in the Bot Prize competition [3], which is similar to the Turing test for intelligent agents, wherein the bots have to convince the human judges that „It‟ is a human, while playing the First Person Shooter (FPS) UT2004. Keywords Video Games, Unreal Tournament 2004, Behavior Trees, Reinforcement Learning, Q-learning. 1. INTRODUCTION Computer games are becoming more and more popular to simulate AI methodologies as they provide a real-time dynamic simulation environment. Virtual characters or NPCs in games should evolve dynamically as the world around them changes due to internal and external events. The idea presented in this paper is focused on creating agents that are capable enough to fool people of them being a human player. Making the NPC behave like human with the same kind of reactions and limitations is a challenging task in the field of AI. As a result, modeling the behavior of NPC in a virtual environment reflects the aspects of behavior in the real world. In order to fulfill our goal, we used the concept of Behavior Trees [13]. Behavior Trees provide a simple and modular approach in defining agent‟s behavior . AI behaves differently in different situations or may sometime show same behavior in two unrelated situations. For example, every time a bot sees any other player, its first intention is not to start shooting at the player rather the bot considers various attributes like how far the player is, what kind of weapon does it have, does the weapon have enough ammo etc. before making any decision. Designers can define simple behaviors and either reuse them in different situations or combine them to form complex behaviors. Behavior Trees come handy in this regard as they are easy to model and understand by anyone. A more detailed discussion on behavior trees can be found in section 3.2. We‟ll use RL [18] approach to maximize the reward by choosing an action from a set of actions available that fits best for a particular game state. RL is an area of machine learning that is based on state-action pairs or policy, and allows the agent to learn through experience. RL agent interacts with the environment in discrete time steps. Beginning with the start state, the agent performs an action in the environment and moves to a new state. The environment rewards the agent indicating how well the agent performed based on the reward function. Several RL algorithms have been developed over the years including Temporal Difference (TD), Q-learning and Sarsa. We will focus on Q-learning algorithm here. To connect our agent to UT2004 [17], different API‟s and middleware are required. UT2004 provides an add-on interface known as GameBots2004 [8] (GB2004) that is used to connect clients to the game via TCP/IP connection. The clients can issue commands and receive messages from GB2004. In order to ease up the development of agents, we use a framework called Pogamut [5]. Pogamut wraps low level details of connecting clients to GB2004 and offers high-level API that focuses on