Integrating Knowledge-based Reasoning and Data-driven Learning in Robotics Mohan Sridharan Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK Email: m.sridharan@bham.ac.uk Abstract—This paper summarizes work on an architecture for robots that combines the complementary strengths of knowledge- based reasoning and data-driven learning. The architecture supports non-monotonic logical reasoning and probabilistic rea- soning with incomplete commonsense domain knowledge. Rea- soning triggers and guides learning of previously unknown domain knowledge when needed based on deep learning and reinforcement learning methods. Furthermore, the architecture enables the robot to provide relational descriptions of its decisions and the evolution of beliefs during reasoning and learning. The architecture’s capabilities are illustrated and evaluated in simulation and on physical robots. I. MOTIVATION As an illustrative example, consider a robot assistant (RA) domain in which a robot has to: (a) deliver target objects to particular people or rooms; and (b) estimate and revise the occlusion of objects and stability of object configurations in a particular room. There is uncertainty in the robot’s perception and actuation. The robot’s incomplete domain knowledge includes commonsense knowledge, e.g., statements such as “books are usually in the study” that hold in all but a few exceptional circumstances, e.g., cookbooks are in the kitchen. The robot also extracts information from noisy sensor inputs, with quantitative measures of uncertainty, e.g., “I am 90% certain I saw the robotics book in office-1”. In addition, the robot has some prior knowledge of object attributes such as size, surface, and shape; grounding of some prepositional words such as above and in representing the spatial relations between objects; and some axioms governing domain dynam- ics. Examples of these axioms include: • Placing an object on top of another with an irregular surface results in instability. • An object can only be in one location at a time. • An object below another object cannot be picked up. The robot reasons with the knowledge and observations for inference, planning, and diagnostics. In any practical domain, it will have to revise this knowledge over time; this is often accomplished by data-driven (e.g., deep, reinforcement) learn- ing methods that process observations, labeled datasets, and/or human input. Also, enabling the robot to describe its decisions and the evolution of beliefs at different levels of abstraction will lead to more effective collaboration with humans. Our ar- chitecture seeks to support these capabilities by exploiting the complementary strengths of declarative logic programming, probabilistic reasoning, and data-driven interactive learning. We briefly describe the architecture’s components below. II. ARCHITECTURE OVERVIEW Our baseline architecture for knowledge representation, explainable reasoning, and interactive learning, is based on tightly-coupled transition diagrams at different resolutions. It may be viewed as a logician, statistician, and a creative ex- plorer working together; Figure 1 presents an overview of this architecture. The different transition diagrams are described using an action language AL d [3], which has a sorted signature with statics, fluents, and actions, and supports three types of statements: causal laws, state constraints, and executability conditions; the fluents can be non-Boolean and axioms can be non-deterministic. Depending on the domain and tasks at hand, the robot chooses to plan and execute actions at two specific resolutions, but can construct and provide explanations at other resolutions; for ease of understanding, we limit our discussion to two resolutions in this paper. Knowledge representation and reasoning: The coarse res- olution domain description comprises system description D c of transition diagram τ c , a collection of AL d statements, and history H c . D c comprises sorted signature Σ c and axioms. For RA domain, Σ c includes basic sorts such as place, thing, robot, person, object, cup, size, surface, and step; statics such as next to(place, place) and obj surface(obj, surface); fluents such as loc(thing, place), obj rel(relation, object, object), and in hand(entity, object); and actions such as move(robot, place), pickup(robot, object), putdown(robot, object, location), and give(robot, object, person). Axioms in D c include statements such as: move(rob 1 ,P ) causes loc(rob 1 ,P ) loc(O,P ) if loc(rob 1 ,P ), in hand(rob 1 ,O) impossible give(rob 1 ,O,P ) if loc(rob 1 ,L 1 ) = loc(P,L 2 ) that correspond to a causal law, state constraint, and exe- cutability condition respectively. The history H c of a dynamic domain is typically a record of fluents observed to be true or false at a particular time step, and the occurrence of actions at a particular time step. This definition is expanded to represent prioritized defaults describ- ing the values of fluents in the initial state, i.e., statements such as “books are usually in the library; if not there, they are in the office” with the exception “cookbooks are in the kitchen”. To reason with the domain description, we construct pro- gram Π(D c , H c ) in CR-Prolog, a variant of Answer Set Prolog