Human-Robot Interface Based on Speech Understanding Assisted by Vision Shengshien Chong 1 , Yoshinori Kuno 1,2 , Nobutaka Shimada 1 and Yoshiaki Shirai 1 1 Department of Computer-Controlled Mechanical Systems, Osaka University, Japan 2 Department of Information and Computer Sciencies, Saitama University, Japan Abstract. Speech recognition provides a natural and familiar interface for human beings to pass on information. For this, it is likely to be used as the human interface in service robots. However, in order for the robot to move in accordance to what the user tells it, there is a need to look at information other than those obtained from speech input. First, we look at the widely discussed problem in natural language processing of abbreviated communication of common context between parties. In addition to this, another problem exists for a robot, and that is the lack of information linking symbols in a robot’s world to things in a real world. Here, we propose a method of using image processing to make up for the information lacking in language processing that makes it insufficient to carry out the action. And when image processing fails, the robot will ask the user directly and use his/her answer to help it in achieving its task. We confirm our theories by performing experiments on both simulation and real robot and test their reliability. Keywords : human-robot interface; speech understanding; vision-based interface; service robot; face recognition 1. Introduction As the number of senior citizens increases, more research efforts have been aimed at developing service robots to be used in the welfare service. However, these developments depend very much on the technology of human interface. It should allow even handicapped persons to be able to give commands to the robot in a simple and natural way. The demand for user-friendliness leads naturally to a dialogue controlled speech interface, which enables everyone to communicate easily with the robot. This is not only needed for convenience but also for lowering the inhibition threshold for using it, which still might be a problem for widespread usage. Not only do we want the robot to understand robotic commands but also we want the robot to be able to understand human-like commands and be more flexible in its language understanding. For strongly handicapped persons, unable to use keyboards or touch screens, speech understanding and dialogue is one of the main preliminaries. Moreover, we have seen an improvement in the quality of the speech recognition technology in recent years, and we foresee that this technology will be widely used in the coming future as the technology further improves. It is also possible to operate the computer using voice nowadays. However, there is a need to memorize those commands and T. Tan, Y. Shi, and W. Gao (Eds.): ICMI 2000, LNCS 1948, pp. 16-23, 2000. © Springer-Verlag Berlin Heidelberg 2000