Structural Abstraction Experiments in Reinforcement Learning Robert Fitch 1 , Bernhard Hengst 1 , Dorian ˇ Suc 1 , Greg Calbert 2 , and Jason Scholz 2 1 National ICT Australia, University of NSW, Australia {robert.fitch,bernhard.hengst,dorian.suc}@nicta.com.au 2 Defence Science and Technology Organization, Salisbury SA Australia {greg.calbert,jason.scholz}@dsto.defence.gov.au Abstract. A challenge in applying reinforcement learning to large prob- lems is how to manage the explosive increase in storage and time com- plexity. This is especially problematic in multi-agent systems, where the state space grows exponentially in the number of agents. Function ap- proximation based on simple supervised learning is unlikely to scale to complex domains on its own, but structural abstraction that exploits sys- tem properties and problem representations shows more promise. In this paper, we investigate several classes of known abstractions: 1) symme- try, 2) decomposition into multiple agents, 3) hierarchical decomposition, and 4) sequential execution. We compare memory requirements, learning time, and solution quality empirically in two problem variations. Our re- sults indicate that the most eﬀective solutions come from combinations of structural abstractions, and encourage development of methods for automatic discovery in novel problem formulations. 1 Introduction When specifying a problem such as learning to walk, learning to manipulate objects or learning to play a game as a reinforcement learning (RL) problem, the number of states and actions is often too large for the learner to manage. It is straightforward to describe the state of a system using several variables to represent the various attributes of its components. Similarly, individual system component actions can be used to describe a joint action vector. However, this approach very quickly leads to an intractable speciﬁcation of the RL problem. A tabular state-action representation requires storage proportional to the product of the size of all the state and action variables, leading to intractable storage and time complexity. Function approximation can often help by generalizing the value function across many states. Function approximation is based on supervised learning. Gradient descent methods, such as artiﬁcial neural networks and linear function approximation are frequently used in RL for this purpose [1]. However, there are reasons to believe that simple function approximation will not scale to larger, more complex, problems.