Can Co-robots Learn to Teach? Harshal Maske, Emily Kieson, Girish Chowdhary, and Charles Abramson Abstract—We explore beyond existing work on learning from demonstration by asking the question: “Can robots learn to teach?”, that is, can a robot autonomously learn an instructional policy from expert demonstration and use it to instruct or collaborate with humans in executing complex tasks in uncer- tain environments? In this paper we pursue a solution to this problem by leveraging the idea that humans often implicitly decompose a higher level task into several subgoals whose execution brings the task closer to completion. We propose Dirichlet process based non-parametric Inverse Reinforcement Learning (DPMIRL) approach for reward based unsupervised clustering of task space into subgoals. This approach is shown to capture the latent subgoals that a human teacher would have utilized to train a novice. The notion of “action primitive” is introduced as the means to communicate instruction policy to humans in the least complicated manner, and as a computation- ally efficient tool to segment demonstration data. We evaluate our approach through experiments on hydraulic actuated scaled model of an excavator and evaluate and compare different teaching strategies utilized by the robot. I. I NTRODUCTION In many real world robotic applications, human operators play a critical role in ensuring the safety and efficiency of the task. Some examples include heavy construction and agricultural robotics where human operators of co-robots such as excavators, tractors, and backhoes must make safety- critical decisions in real-time under uncertain and dynami- cally changing environments. The skill-gap between expert and novice operators of these robots is a significant limiting factor in ensuring safety, efficiency, and quality at work-sites. If a co-robot was able to learn from experts and utilize that knowledge to assist or teach novice operators, significant performance gains could be achieved. In this paper, we study the crucial problem of directly learning instruction policies for novice operators from demonstrations provided by skilled operators. Learning from Demonstration has been widely studied in the context of robots learning to do a task from teacher demonstrations [2]. However, when a robot needs to teach a human operator, the robot needs to do much more than just learning to imitate the demonstrated task. Rather, it has to simplify and decompose the tasks into human understandable task-primitives, and communicate back the essential sequence of actions to guide the human learning from the robot. This brings us to the very crucial question: How can Robots learn to Teach? We argue that there are Harshal Maske, and Asst. Prof. Girish Chowdhary * are with the Distributed Autonomous Systems laboratory (DASLAB), University of Illinois at Urbana Champaign,{hmaske2@illinois.edu, girishc@illinois.edu}. Emily Kieson, and Prof. Charles Abramson are with the Comparative Psychology Laboratory, Oklahoma State University, {kieson@okstate.edu, charles.abramson@okstate.edu}. Task Demos Action Primitive Segmen- tation (sec III-A,B) DPMIRL (sec III-C) Instructional Policy Model (sec III-D) Trajectories Segments Subgoals & Segments Co- robot Ongoing Task human Op- erator Instruction Inter- face (sec III-E) Task Space Config- uration Joystick Input In Real Time (sec IV-C) Fig. 1: Co-robot learns instructional policy from expert demonstrations (off-line learning in yellow). In real time (shown in green), co-robot generates instruction/guidance for human operators based on the current task space configura- tion and the instructional policy model. two important aspects to answering this question: First is the development of practical algorithms that would allow a co-robot to extract latent subgoals or skills that a human teacher would have utilized to instruct other humans. Second, is the development of feedback strategies for providing the appropriate task specific guidance to the human based on the current task state. The approach formulated in this paper is designed to address both these aspects and is shown through extensive experimentations to enable robots to teach complex tasks to human operators (see figure 1). The main contribution of this paper is a method to directly learn an instructional policy from human demonstrations. We define an instructional policy as a feedback policy that utilizes the robots current and past state to suggest the best set of future actions in order to proceed with a given task. This should be contrasted with existing LfD work that has focused primarily on the robot learning a policy for executing the task by itself. Yet, our approach is highly scalable and generalizable, and has been demonstrated to work on a realistic LfD problem with multiple degrees of freedom and uncertain operating conditions. Hence, it can also be used in a pure LfD form for complex real-world robotic tasks, such as those often encountered in construction. To ensure scalability and generalizability as well as to simplify arXiv:1611.07490v1 [cs.RO] 22 Nov 2016