Mobile Human-Robot Teaming with Environmental Tolerance Matthew M. Loper Computer Science Dpt. Brown University 115 Waterman St. Providence, RI 02912 matt@cs.brown.edu Nathan P. Koenig Computer Science Dpt. University of Southern California 941 W. 37th Place Los Angeles, CA 90089 nkoenig@usc.edu Sonia H. Chernova Computer Science Dpt. Carnegie Mellon University 5000 Forbes Ave Pittsburgh, PA 15213 soniac@cs.cmu.edu Chris V. Jones Research Group iRobot Corporation 8 Crosby Dr, Bedford, MA cjones@irobot.com Odest C. Jenkins Computer Science Dpt. Brown University 115 Waterman St. Providence, RI 02912 cjenkins@cs.brown.edu ABSTRACT We demonstrate that structured light-based depth sensing with standard perception algorithms can enable mobile peer- to-peer interaction between humans and robots. We posit that the use of recent emerging devices for depth-based imag- ing can enable robot perception of non-verbal cues in human movement in the face of lighting and minor terrain varia- tions. Toward this end, we have developed an integrated robotic system capable of person following and responding to verbal and non-verbal commands under varying lighting conditions and uneven terrain. The feasibility of our system for peer-to-peer HRI is demonstrated through two trials in indoor and outdoor environments. Categories and Subject Descriptors I.2.9 [Artiﬁcial Intelligence]: Robotics—Operator Inter- faces ; I.4.8 [Image processing and computer vision]: Scene Analysis—Range data, Tracking ; I.5.4 [Pattern Recog- nition]: Applications—Computer vision General Terms Design, Human Factors Keywords Human-robot interaction, person following, gesture recogni- tion Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. HRI’09, March 11–13, 2009, La Jolla, California, USA. Copyright 2009 ACM 978-1-60558-404-1/09/03 ...$5.00. 1. INTRODUCTION Mobile robots show great promise in assisting people in a variety of domains, including medical, military, recreational, and industrial applications [22]. However, if robot assistants are to be ubiquitous, teleoperation may not be the only an- swer to robot control. Teleoperation interfaces may require learning, can be physically encumbering, and are detrimen- tal to users’ situation awareness. In this paper, we exhibit an alternative approach to interaction and control. Speciﬁcally, we show the feasibility of active-light based depth sensing for mobile person-following and gesture recog- nition; such a sensor has strong potential for reducing the perceptual (and computational) burden for tasks involving person following and observation. The most essential aspect of our system is the reliability from which accurate silhouttes can be extracted from active depth imaging. Our system is augmented with voice recognition and a simple state-based behavior system for when the user is out of visual range. Existing approaches integrate vision, speech recognition, and laser-based sensing to achieve human-robot interaction [20, 14, 6, 8]. Other recent approaches focus speciﬁcally on peo- ple following [16, 5], gesture-based communication [10, 7, 23], or voice-based operation [11, 15]. However, these sys- tems are either typically designed for use in indoor envi- ronments, or do not necessarily incorporate following and gesture recognition. Our approach is intended to further the ﬁeld with viability in both indoor and outdoor environments, via the use of active sensing, robust perception mechanisms, and a ruggedized platform. One promising approach to pose estimation via range imaging by Knoop et al [9] uses an artic- ulated model and interative closest point search. However, their focus is on pose tracking (unlike our gesture recogni- tion) as they have additional assumptions about initial pose alignment. At an abstract level, we strive for environmental tolerance : the ability of a system to work in a variety of conditions and locales. Of course, such tolerance can take many forms, and we do not make blanket claims of robustness against all forms of environmental variance. Our methods are meant