Deep Spatial Affordance Hierarchy: Spatial Knowledge Representation for Planning in Large-scale Environments Andrzej Pronobis University of Washington Seattle, WA, USA Email: pronobis@cs.washington.edu Francesco Riccio Sapienza University of Rome Rome, Italy Email: riccio@diag.uniroma1.it Rajesh P. N. Rao University of Washington Seattle, WA, USA Email: rao@cs.washington.edu Abstract—Domain-specific state representations are a funda- mental component that enables planning of robot actions in unstructured human environments. In case of mobile robots, it is the spatial knowledge that constitutes the core of the state, and directly affects the performance of the planning algorithm. Here, we propose Deep Spatial Affordance Hierarchy (DASH), a probabilistic representation of spatial knowledge, spanning multiple levels of abstraction from geometry and appearance to semantics, and leveraging a deep model of generic spatial con- cepts. DASH is designed to represent space from the perspective of a mobile robot executing complex behaviors in the environment, and directly encodes gaps in knowledge and spatial affordances. In this paper, we explain the principles behind DASH, and present its initial realization for a robot equipped with laser-range sensor. We demonstrate the ability of our implementation to successfully build representations of large-scale environments, and leverage the deep model of generic spatial concepts to infer latent and missing information at all abstraction levels. I. I NTRODUCTION Many recent advancements in the fields of robotics and artificial intelligence have been driven by the ultimate goal of creating artificial agents able to perform service tasks in real environments in collaboration with humans [22, 23, 9]. While significant progress have been made in the area of robot control, largely thanks to the success of deep learning [13], we are still far from solving more complex scenarios that require forming plans spanning large spatio-temporal horizons. In such scenarios, domain-specific state representations play a crucial role in determining the capabilities of the agent and the tractability of the solution. In case of mobile robots oper- ating in large-scale environments, it is the spatial knowledge that constitutes the core of the state. As a result, the way in which it is represented directly affects the actions the robot can plan for, the performance of the planning algorithm, and ultimately, the ability of the robot to successfully reach the goal. For complex tasks involving interaction with humans, the relevant spatial knowledge spans multiple levels of ab- straction and spatial resolutions, including detailed geometry and appearance, global environment structure, and high-level semantic concepts. Representing such knowledge is a difficult task given uncertainty and partial observability governing real applications in human environments. In this work, we propose Deep Spatial Affordance Hierarchy (DASH), a probabilistic representation of spatial knowledge designed to support and facilitate planning and execution of complex behaviors by a mobile robot. The representation encodes the belief about the state of the world as well as spatial affordances, i.e. the possibilities of actions on objects or locations in the environment. It does so by leveraging a hier- archy of sub-representations (layers), which represent multiple spatial knowledge abstractions (from geometry and appearance to semantic concepts), using different spatial resolutions (from voxels to places), frames of reference (allo- or ego-centric), and spatial scopes (from local to global). The structure of DASH corresponds to a hierarchical decomposition of the planning problem. Additionally, DASH is designed to explicitly represent and fill gaps in spatial knowledge due to uncertainty, unknown concepts, missing observations or unexplored space. This brings the possibility of using the representation in open- world scenarios, involving active exploration and learning. DASH includes both instance knowledge about the specific robot environment as well as default knowledge about typ- ical human environments. The latter is modeled using a re- cently proposed Deep Generative Spatial Model (DGSM) [19]. DGSM leverages recent developments in deep learning, pro- viding fully probabilistic, generative model of spatial concepts learned directly from raw sensory data. DGSM unifies the layers of our representation, enabling upwards and downwards inferences about latent or missing spatial knowledge defined at various levels of abstraction. In this paper, we describe the architecture of DASH and present its initial realization for a mobile robot equipped with a laser range sensor. We perform a series of experiments demonstrating the ability of the representation to perform different types of inferences, including bottom-up inferences about semantic spatial concepts and top-down inferences about geometry of the environment. We then showcase its ability to build semantic representations of large-scale environments (e.g. floors of an office building).