Execution Monitoring as Meta-Games for General Game-Playing Robots David Rajaratnam and Michael Thielscher The University of New South Wales Sydney, NSW 2052, Australia {daver,mit}@cse.unsw.edu.au Abstract General Game Playing aims to create AI systems that can understand the rules of new games and learn to play them effectively without human in- tervention. The recent proposal for general game- playing robots extends this to AI systems that play games in the real world. Execution monitoring be- comes a necessity when moving from a virtual to a physical environment, because in reality actions may not be executed properly and (human) oppo- nents may make illegal game moves. We develop a formal framework for execution monitoring by which an action theory that provides an axiomatic description of a game is automatically embedded in a meta-game for a robotic player — called the ar- biter — whose role is to monitor and correct failed actions. This allows for the seamless encoding of recovery behaviours within a meta-game, enabling a robot to recover from these unexpected events. 1 Introduction General game playing is the attempt to create a new genera- tion of AI systems that can understand the rules of new games and then learn to play these games without human interven- tion [Genesereth et al., 2005]. Unlike specialised systems such as the chess program Deep Blue, a general game player cannot rely on algorithms that have been designed in advance for speciﬁc games. Rather, it requires a form of general in- telligence that enables the player to autonomously adapt to new and possibly radically different problems. General game- playing robots extend this capability to AI systems that play games in the real world [Rajaratnam and Thielscher, 2013]. Execution monitoring [H¨ ahnel et al., 1998; De Giacomo et al., 1998; Fichtner et al., 2003] becomes a necessity when moving from a purely virtual to a physical environment, be- cause in reality actions may not be executed properly and (human) opponents may make moves that are not sanctioned by the game rules. In a typical scenario a robot follows a plan generated by a traditional planning algorithm. As it ex- ecutes each action speciﬁed by the plan the robot monitors the environment to ensure that the action has been success- fully executed. If an action is not successfully executed then some recovery or re-planning behaviour is triggered. While the sophistication of execution monitors may vary [Petters- son, 2005] a common theme is that the execution monitor is independent of any action planning components. This allows for a simpliﬁed model where it is unnecessary to incorporate complex monitoring and recovery behaviour into the planner. In this paper, we develop a framework for execution mon- itoring for general game-playing robots that follows a simi- lar model. From an existing game axiomatised in the general Game Description Language GDL [Genesereth et al., 2005; Love et al., 2006] a meta-game is generated that adds an ex- ecution monitor in the form of an arbiter player. The “game” being played by the arbiter is to monitor the progress of the original game to ensure that the moves played by each player are valid. If the arbiter detects an illegal or failed move then it has the task of restoring the game to a valid state. Impor- tantly, the non-arbiter players, whether human or robotic, can ignore and reason without regard to the arbiter player while the latter becomes active only when an error state is reached. Our speciﬁc contributions are: (1) A fully axiomatic ap- proach to embedding an arbitrary GDL game into a meta- game that implements a basic execution monitoring strategy relative to a given physical game environment. This meta- game is fully axiomatised in GDL so that any GGP player can take on the role of the arbiter and thus be used for execu- tion monitoring. (2) Proofs that the resulting meta-game satis- ﬁes important properties including being a well-formed GDL game. (3) Generalisations of the basic recovery behaviour to consider actions that are not reversible but instead may in- volve multiple actions in order to recover the original game state and that need to be planned by the arbiter. The remainder of the paper is as follows. Section 2 brieﬂy introduces the GDL language for axiomatising games. Sec- tion 3 presents a method for embedding an arbitrary GDL game into a GDL meta-game for execution monitoring. Sec- tion 4 investigates and proves important properties of the re- sulting game description, and Sections 5 and 6 describe and formalise two extensions of the basic recovery strategy. 2 Background: General Game Playing, GDL The annual AAAI GGP Competition [Genesereth et al., 2005] deﬁnes a general game player as a system that can un- derstand the rules of an n-player game given in the general Game Description Language (GDL) and is able to play those games effectively. Operationally a game consists of a central