Integrating Knowledge-based Reasoning and Data-driven Learning in Robotics Mohan Sridharan Intelligent Robotics Lab, School of Computer Science, University of Birmingham, UK Email: m.sridharan@bham.ac.uk Abstract—This paper summarizes work on an architecture for robots that combines the complementary strengths of knowledge- based reasoning and data-driven learning. The architecture supports non-monotonic logical reasoning and probabilistic rea- soning with incomplete commonsense domain knowledge. Rea- soning triggers and guides learning of previously unknown domain knowledge when needed based on deep learning and reinforcement learning methods. Furthermore, the architecture enables the robot to provide relational descriptions of its decisions and the evolution of beliefs during reasoning and learning. The architecture’s capabilities are illustrated and evaluated in simulation and on physical robots. I. MOTIVATION As an illustrative example, consider a robot assistant (RA) domain in which a robot has to: (a) deliver target objects to particular people or rooms; and (b) estimate and revise the occlusion of objects and stability of object conﬁgurations in a particular room. There is uncertainty in the robot’s perception and actuation. The robot’s incomplete domain knowledge includes commonsense knowledge, e.g., statements such as “books are usually in the study” that hold in all but a few exceptional circumstances, e.g., cookbooks are in the kitchen. The robot also extracts information from noisy sensor inputs, with quantitative measures of uncertainty, e.g., “I am 90% certain I saw the robotics book in ofﬁce-1”. In addition, the robot has some prior knowledge of object attributes such as size, surface, and shape; grounding of some prepositional words such as above and in representing the spatial relations between objects; and some axioms governing domain dynam- ics. Examples of these axioms include: • Placing an object on top of another with an irregular surface results in instability. • An object can only be in one location at a time. • An object below another object cannot be picked up. The robot reasons with the knowledge and observations for inference, planning, and diagnostics. In any practical domain, it will have to revise this knowledge over time; this is often accomplished by data-driven (e.g., deep, reinforcement) learn- ing methods that process observations, labeled datasets, and/or human input. Also, enabling the robot to describe its decisions and the evolution of beliefs at different levels of abstraction will lead to more effective collaboration with humans. Our ar- chitecture seeks to support these capabilities by exploiting the complementary strengths of declarative logic programming, probabilistic reasoning, and data-driven interactive learning. We brieﬂy describe the architecture’s components below. II. ARCHITECTURE OVERVIEW Our baseline architecture for knowledge representation, explainable reasoning, and interactive learning, is based on tightly-coupled transition diagrams at different resolutions. It may be viewed as a logician, statistician, and a creative ex- plorer working together; Figure 1 presents an overview of this architecture. The different transition diagrams are described using an action language AL d [3], which has a sorted signature with statics, ﬂuents, and actions, and supports three types of statements: causal laws, state constraints, and executability conditions; the ﬂuents can be non-Boolean and axioms can be non-deterministic. Depending on the domain and tasks at hand, the robot chooses to plan and execute actions at two speciﬁc resolutions, but can construct and provide explanations at other resolutions; for ease of understanding, we limit our discussion to two resolutions in this paper. Knowledge representation and reasoning: The coarse res- olution domain description comprises system description D c of transition diagram τ c , a collection of AL d statements, and history H c . D c comprises sorted signature Σ c and axioms. For RA domain, Σ c includes basic sorts such as place, thing, robot, person, object, cup, size, surface, and step; statics such as next to(place, place) and obj surface(obj, surface); ﬂuents such as loc(thing, place), obj rel(relation, object, object), and in hand(entity, object); and actions such as move(robot, place), pickup(robot, object), putdown(robot, object, location), and give(robot, object, person). Axioms in D c include statements such as: move(rob 1 ,P ) causes loc(rob 1 ,P ) loc(O,P ) if loc(rob 1 ,P ), in hand(rob 1 ,O) impossible give(rob 1 ,O,P ) if loc(rob 1 ,L 1 ) = loc(P,L 2 ) that correspond to a causal law, state constraint, and exe- cutability condition respectively. The history H c of a dynamic domain is typically a record of ﬂuents observed to be true or false at a particular time step, and the occurrence of actions at a particular time step. This deﬁnition is expanded to represent prioritized defaults describ- ing the values of ﬂuents in the initial state, i.e., statements such as “books are usually in the library; if not there, they are in the ofﬁce” with the exception “cookbooks are in the kitchen”. To reason with the domain description, we construct pro- gram Π(D c , H c ) in CR-Prolog, a variant of Answer Set Prolog