In ICAPS-07 Workshop on AI Planning and Learning (AIPL-07), Providence, RI, September 2007. Accelerating Search with Transferred Heuristics Matthew E. Taylor, Gregory Kuhlmann, and Peter Stone Department of Computer Sciences The University of Texas at Austin Austin, Texas 78712-1188 {mtaylor, kuhlmann, pstone}@cs.utexas.edu Abstract A common goal for transfer learning research is to show that a learner can solve a source task and then leverage the learned knowledge to solve a target task faster than if it had learned the target task directly. A more difficult goal is to reduce the total training time so that learn- ing the source task and target task is faster than learning only the target task. This paper addresses the second goal by proposing a transfer hi- erarchy for 2-player games. Such a hierarchy orders games in terms of relative solution difficulty and can be used to select source tasks that are faster to learn than a given target task. We empirically test transfer between two types of tasks in the General Game Playing domain, the testbed for an international competition developed at Stanford. Our results show that transferring learned search heuristics from tasks in different parts of the hierarchy can significantly speed up search even when the source and target tasks differ along a number of important dimensions. Introduction If you cannot solve the proposed problem try to solve first some re- lated problem. Could you imagine a more accessible related problem? . . . Could you solve a part of the problem? (Polya 1945, p. xvii) Polya’s 1945 book How To Solve It motivates the general prin- ciple behind transfer learning (TL). In this machine learning paradigm, a learner first solves a source task and then uses its knowledge to solve a target task. Rather than learning a difficult target task directly, consider the following three-step TL process: 1. The learner must find or construct a source task that is relevant to, but simpler than, a target task. Full details of the specific target task may or may not be available during this phase. 2. The learner must solve the simple source task with relatively little effort compared to solving the full target task. 3. The learner must transfer the knowledge gained from the source task and utilize it to solve the target task. A typical goal in TL research is to reduce the time needed to learn a target task after first learning a source task, relative to learn- ing the target task without transfer. This target task goal can be achieved whenever the learner can transfer useful information from the source task into the target task. The majority of TL research to date has focused on this goal (step # 3), demonstrating both the fea- sibility of transfer and the many dimensions along which the source and target may differ while still allowing transfer. In these TL sce- narios the relevant source task or tasks are generally provided to the learner for each target task. A more difficult goal is to reduce the total training time so that learning the source task and target task is faster than learning the tar- get task directly. The total time goal is attainable only if the source Copyright c 2007, Association for the Advancement of Artificial Intelli- gence (www.aaai.org). All rights reserved. task (called an auxiliary problem by Polya) is faster to solve than the target task, and the speedup in target task training time overcomes the time spent on learning the source task. To achieve this goal the learner must reason about all three steps. This paper takes a first step at the difficult problem of discovering appropriate source tasks by proposing a transfer hierarchy. Such a structure defines types of games that require more or less information to solve and thus may be used to order tasks by their relative solution complexity. Such an ordering can be used to identify source tasks that will take significantly less time to solve than a particular target task, re- ducing the impact of source task training on the total training time. In the future we hope that such a transfer hierarchy will be used to help automate the transfer learning process by assisting in the selec- tion of a source task for a given target task. In this paper we begin to evaluate the effectiveness of our proposed hierarchy by manu- ally constructing source tasks for a specified target task, where the selection of source task are motivated by the transfer hierarchy. To empirically demonstrate transfer between source and target task taken from our transfer hierarchy, we utilize the game of Mummy Maze. This game is an appropriate choice for two rea- sons. First, it has been released as a sample domain in the General Game Playing (Genesereth & Love 2005) (GGP) contest, an inter- national competition developed independently at Stanford. Second, the Mummy Maze task is easily modifiable so that it can conform to each task type in our transfer hierarchy. Our results show that a transferred heuristic is able to improve the speed of search by as much as 34%, meeting the target time goal, even if our source tasks differ from the target tasks along a number of dimensions. Addi- tionally, we demonstrate how the total training time goal may also be met for this particular pair of source and target types, depending on information gathering costs. A Transfer Hierarchy for Games Mapping problem characteristics to the correct solution type is an important open problem for AI. For instance, given a control prob- lem, should the solution be solved optimally or approximately? Is planning or reinforcement learning (Sutton & Barto 1998) (RL) more appropriate? If RL, should the solution be model-based or model-free? This work assumes that such an appropriate mapping exists; given certain characteristics of a game, we propose an ap- propriate solution method. The characteristics we select are based on the amount of information provided to a player about the game’s environment and opponent. For instance, if a learner has a full model of the effects of ac- tions and knows how its opponent will react in any situation, the learner may determine an optimal solution by “thinking” through the task using dynamic programming (Bellman 1957) (DP). At the other extreme, a learner may have to make decisions in a task where