A Clickable World: Behavior Selection Through Pointing and Context for Mobile Manipulation Hai Nguyen, Advait Jain, Cressel Anderson, Charles C. Kemp Abstract—We present a new behavior selection system for human-robot interaction that maps virtual buttons overlaid on the physical environment to the robot’s behaviors, thereby creating a clickable world. The user clicks on a virtual button and activates the associated behavior by briefly illuminating a corresponding 3D location with an off-the-shelf green laser pointer. As we have described in previous work, the robot can detect this click and estimate its 3D location using an omnidirectional camera and a pan/tilt stereo camera. In this paper, we show that the robot can select the appropriate behavior to execute using the 3D location of the click, the context around this 3D location, and its own state. For this work, the robot performs this selection process using a cascade of classifiers. We demonstrate the efficacy of this approach with an assistive object-fetching application. Through empirical evaluation, we show that the 3D location of the click, the state of the robot, and the surrounding context is sufficient for the robot to choose the correct behavior from a set of behaviors and perform the following tasks: pick-up a designated object from a floor or table, deliver an object to a designated person, place an object on a designated table, go to a designated location, and touch a designated location with its end effector. I. I NTRODUCTION For assistive robots, being able to correctly decipher user commands would be advantageous for performing useful services. Many methods have been proposed for human-robot interaction but none thus far have been adopted extensively. Interfaces based on the traditional WIMP (windows, icons, menus, pointers) model are often criticized as being an unnatural mode for interaction, while natural interfaces based on speech or gestures are themselves plagued by performance problems in realistic environments. To cope with these dif- ficulties, we present a new human-robot interaction system for which the physical world is viewed as having overlaid virtual buttons that trigger robotic behaviors when clicked by the user. In general, these virtual buttons can be clicked by provid- ing a 3D location to the robot. For this work, the user clicks these virtual buttons using an uninstrumented laser pointer. As we have previously described in [8], our robot El-E has a laser-pointer interface that detects when a user illuminates a location in the environment and estimates its 3D location. We previously validated this approach in the context of object grasping and a preliminary object-fetching application [10]. Within this paper we generalize this approach to form a clickable world interface and demonstrate its efficacy in Charles C. Kemp is with the Faculty of Biomedical Engineering at Georgia Tech charlie.kemp@bme.gatech.edu Fig. 1. A clickable world interface enables a user to trigger appropriate robotic behaviors by clicking on virtual buttons using a laser-pointer. the context of a full assistive object-fetching application designed for motor-impaired individuals. We first discuss the relationship to previous works in Section II. Then, in Sections III and IV, we describe our robot along with details of the clickable world interface as it applies to assistive robots. To evaluate the effectiveness of the system at selecting appropriate behaviors, we present experiments and associated results in Sections V andVI. Finally, we close with concluding remarks. II. RELATED WORK Several other examples of intelligent pointing devices exist, such as Patel and Abowd’s iCam augmented reality system [12]. In this work, users could virtually annotate an environment using a handheld computer containing a laser pointer, camera, and sensors that determined the computer’s position relative to a localization system installed in the environment. The XWand[15] and WorldCursor[14], devel- oped at Microsoft Research allow people to select locations in the environment. The XWand is a wand-like device that enables the user to point at an object in the environment and control it using gestures and voice commands. For example, lights can be turned on and off by pointing at the switch and saying “turn on” or “turn off”, a media player can be controlled by pointing at it and giving spoken commands such as “volume up”, “play”, etc. This work is similar in spirit to ours, the object to be acted upon is selected using the XWand and a simple command specifies what task is to be performed. For our work, having a robot perform tasks avoids the need for specialized, networked, computer-