Hyperbolic Embeddings for Learning Options in Hierarchical Reinforcement Learning Saket Tiwari M. Prannoy University of Massachusetts, Amherst University of Massachusetts, Amherst Abstract Hierarchical reinforcement learning deals with the problem of breaking down large tasks into meaningful sub-tasks. Au- tonomous discovery of these sub-tasks has re- mained a challenging problem. We propose a novel method of learning sub-tasks by com- bining paradigms of routing in computer net- works and graph based skill discovery within the options framework [Sutton et al., 1999] to define meaningful sub-goals. We apply the recent advancements of learning embeddings using Riemannian optimisation in the hyper- bolic space to embed the state set into the hyperbolic space and create a model of the environment. In doing so we enforce a global topology on the states and are able to exploit this topology to learn meaningful sub-tasks. We demonstrate empirically, both in discrete and continuous domains, how these embed- dings can improve the learning of meaningful sub-tasks. 1 INTRODUCTION Hierarchical reinforcement learning methods enable agents to tackle challenging problems by breaking them down into smaller ones. There are numerous criteria to break a problem down into smaller parts. One criteria could be re-usability of skills. Another approach can be defining short term sub- tasks, which helps the agent by breaking a task into a sequence of meaningful chunks. For example, a human being faced with the mundane task of boarding an air- plane will break the task down into multiple sub-tasks. The person has to pack their bags. Then they need to Proceedings of the 22 nd International Conference on Ar- tificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan. PMLR: Volume 89. Copyright 2019 by the author(s). get to the airport, which involves splitting up this sub- task into sub-sub-tasks like getting out of home, going to the bus stop and boarding the right bus. After that they will have to go through security and finally board the flight by getting to the gate. This gives them a se- quence of sub-tasks, namely, pack bags, get out of the home, go to the bus stop, get on bus, get down at the airport, security clearance and finally board the flight at the gate. Even though this kind of a break down comes naturally to humans, forming meaningful sub-tasks in the context of a reinforcement learning problem is a challenging problem and falls under the purview of hierarchical reinforcement learning meth- ods. Several mathematical frameworks for hierarchi- cal reinforcement learning have been proposed, includ- ing hierarchies of machines [Parr and Russell, 1998], MAXQ [Dietterich, 2000], and the options framework [Sutton et al., 1999]. Numerous methods for learning meaningful sub-tasks have been proposed which are heuristic in nature [Machado et al., 2017, S ¸im¸ sek and Barto, 2008, Thrun and Schwartz, 1995, Konidaris and Barto, 2009, Mc- Govern and Barto, 2001, Konidaris and Barto, 2009]. The recently proposed option-critic framework [Bacon et al., 2017] splits tasks under the options framework by directly optimizing for the return. The Feudal net- work based approach [Vezhnevets et al., 2017] defines managers that assign a goal and workers that take actions moving the agent in the direction those goals. The hierarchical deep reinforcement learning frame- work [Kulkarni et al., 2016] works within the options framework by defining a goal set and assigning intrin- sic rewards upon reaching these goals. We propose a novel method for finding meaningful skills. We do so by exploiting the geometry and topol- ogy of hyperbolic spaces. The hyperbolic space, due to its underlying geometry, has been shown to be effec- tive in capturing hierarchies [Nickel and Kiela, 2017, Ganea et al., 2018, Nickel and Kiela, 2018] and also for routing data packets in real world computer net- works [Krioukov et al., 2010]. The essential idea, in the context of routing data packets using hyperbolic arXiv:1812.01487v1 [cs.LG] 4 Dec 2018