Relational Temporal Diﬀerence Learning Nima Asgharbeygi nimaa@stanford.edu David Stracuzzi stracudj@csli.stanford.edu Pat Langley langley@csli.stanford.edu Center for the Study of Language and Information, Computational Learning Laboratory, Stanford University, Stanford, CA 94305 USA Abstract We introduce relational temporal diﬀerence learning as an eﬀective approach to solv- ing multi-agent Markov decision problems with large state spaces. Our algorithm uses temporal diﬀerence reinforcement to learn a distributed value function represented over a conceptual hierarchy of relational predi- cates. We present experiments using two do- mains from the General Game Playing repos- itory, in which we observe that our sys- tem achieves higher learning rates than non- relational methods. We also discuss related work and directions for future research. 1. Background and Motivation Most research in AI views intelligent behavior as search through a problem space to achieve goals. Di- recting that search is crucial to an agent’s success, but crafting search-control heuristics manually is dif- ﬁcult and prone to error. An alternative response is to acquire such heuristic knowledge through learning. One common approach formulates this task as learn- ing control policies from delayed reward, with policies encoded by expected value functions over Markov de- cision processes (Sutton & Barto, 1998). This general approach to reinforcement learning has been studied in many settings and from many perspectives. Most work in this tradition uses limited representa- tions and downplays the role of background knowl- edge. As a result, typical systems search a very large state space and thus learn far more slowly than do humans placed in similar situations. Research on tem- poral abstraction (Dietterich, 2000) and state abstrac- Appearing in Proceedings of the 23 rd International Con- ference on Machine Learning, Pittsburgh, PA, 2006. Copy- right 2006 by the authors. tion (Asadi & Huber, 2004) aims to increase learning rates, but few eﬀorts have utilized the more powerful relational representations that are standard in other AI subﬁelds. Recent work on relational reinforcement learning (Dzeroski et al., 2001) uses ﬁrst-order repre- sentations to provide eﬀective abstraction, but it does not take advantage of action models, which are an im- portant source of knowledge in many domains. In this paper, we report a new approach to learn- ing from delayed reward in multi-player games. Our framework is similar to relational reinforcement learn- ing in its reliance on ﬁrst-order representations. How- ever, it employs a variant of temporal diﬀerencing, which is more appropriate than Q-learning when an action model is available, as Tesauro (1994) and Bax- ter et al. (1998) have demonstrated. As in Dzeroski et al.’s work, we use a relational rep- resentation to support eﬀective generalization across states, which should produce more rapid learning. However, rather than using relational regression trees to encode expected values, we use a factored repre- sentation that associates component values with rela- tional predicates. These are combined into an over- all score, much as in traditional state evaluation func- tions. Our work oﬀers a novel approach to combin- ing ideas from relational reinforcement learning and feature-based temporal diﬀerence learning. In the next section, we describe our representation of states, moves, and expected values, the performance system that uses this knowledge to play games, and our method for relational temporal diﬀerence learning. We then present experiments designed to demonstrate the advantages of this approach. This includes dis- cussion of the general game playing domain and the speciﬁc games on which we evaluate our method. We conclude by discussing related work and outlining di- rections for future research on relational learning from delayed reward.