Hierarchical Assignment of Behaviours to Subpolicies Wilco Moerman 1 , Bram Bakker 2 & Marco Wiering 3 1 Cognitive Artificial Intelligence, Utrecht University, wilco.moerman@gmail.com 2 Intelligent Autonomous Systems Group, University of Amsterdam, bram@science.uva.nl 3 Intelligent Systems Group, Utrecht University, marco@cs.uu.nl Abstract Task decompositions are central in Hierarchical Rein- forcement Learning, but in most approaches they need to be designed a priori, and the agent only needs to fill in the details in the fixed structure. In contrast, the algorithm presented here autonomously identifies be- haviours in an abstract higher level state space. Sub- policies self-organise to specialize for the high level be- haviours that are identified. 1 Introduction The use of hierarchies in Reinforcement Learning (RL) is one of the strategies for dealing with large state spaces. The idea is to improve normal, flat Reinforce- ment Learning by giving it the possibility to execute actions that are temporally extended. The two most common ways to achieve this are the introduction of multiple layers (prime example: MAXQ [1]) and the use of the options framework [2]. In the options framework, extended actions (op- tions) are added to the flat Reinforcement Learning al- gorithm, directly augmenting the action space. This allows taking larger steps, but does not really introduce layers of abstraction. Introducing layers, on the other hand, allows for the use of more abstract representations or states (although not every layered approach uses abstractions). For ap- proaches like MAXQ, the designer needs to define a task decomposition, determining which task is done by which (sub)policy. The agent only needs to fill in the values in the value functions, because the structure of the task, and which subtasks are done by which policies, is al- ready largely fixed, and only the (sub)policies need to be learned. The algorithm presented here takes a different ap- proach. Instead of thinking in terms of detailed task de- compositions, a suitable geometric abstract state space (a higher level representation of the normal state space) is used for the higher level(s), and subpolicies self- organise to cover (i.e. specialize for) the needed be- haviours identified in an abstract Behavior Space. 2 Behaviour Space and Abstract State Space Our method is based on having/identifying an abstract, high-level, geometric state space which captures impor- tant properties of the underlying task and which is used for taking higher-level actions to be executed by specia- list lower-level subpolicies. For such an abstract state space (or any state space, for that matter) we define a Behaviour Space as the set of all possible difference vectors in that state space (see fig. 1). This means that the Behaviour Space consists of all possible vectors that are confined within the di- mensions of the state space. The actions that actually occur in the state space (because they are transitions from one state to another) form a subset of all possible behaviour vectors. b 2 b 1 b 3 Figure 1: Behaviour Space: the space of all pos- sible difference vectors in a state space (with dimen- sions b1,b2,b3). A suitable abstract representation of the problem space has the following properties: states that are close together in the original state space need to be mapped to abstract states near each other (or the same), and neighbouring abstract states need to be mapped to states in the state space that are near each other (fig. 2). Also, a translation (difference vector) in the abstract state space should correspond to a meaningful change in the original state space. Furthermore, a useful ab- stract state space needs to be significantly smaller than the original state space. Finally, the actually occuring transitions between states in the abstract state space need to be distributed non-uniformly in the abstract Behaviour Space. The last requirement is needed to ensure that many transitions are roughly the same, meaning there are “pockets” or “clusters” of actual transitions in the Be- haviour Space. 1