Human-Robot Interface Based on Speech
Understanding Assisted by Vision
Shengshien Chong
1
, Yoshinori Kuno
1,2
, Nobutaka Shimada
1
and Yoshiaki Shirai
1
1
Department of Computer-Controlled Mechanical Systems, Osaka University, Japan
2
Department of Information and Computer Sciencies, Saitama University, Japan
Abstract. Speech recognition provides a natural and familiar interface for
human beings to pass on information. For this, it is likely to be used as the
human interface in service robots. However, in order for the robot to move in
accordance to what the user tells it, there is a need to look at information other
than those obtained from speech input. First, we look at the widely discussed
problem in natural language processing of abbreviated communication of
common context between parties. In addition to this, another problem exists for
a robot, and that is the lack of information linking symbols in a robot’s world to
things in a real world. Here, we propose a method of using image processing to
make up for the information lacking in language processing that makes it
insufficient to carry out the action. And when image processing fails, the robot
will ask the user directly and use his/her answer to help it in achieving its task.
We confirm our theories by performing experiments on both simulation and
real robot and test their reliability.
Keywords : human-robot interface; speech understanding; vision-based interface;
service robot; face recognition
1. Introduction
As the number of senior citizens increases, more research efforts have been aimed at
developing service robots to be used in the welfare service. However, these
developments depend very much on the technology of human interface. It should
allow even handicapped persons to be able to give commands to the robot in a simple
and natural way. The demand for user-friendliness leads naturally to a dialogue
controlled speech interface, which enables everyone to communicate easily with the
robot. This is not only needed for convenience but also for lowering the inhibition
threshold for using it, which still might be a problem for widespread usage. Not only
do we want the robot to understand robotic commands but also we want the robot to
be able to understand human-like commands and be more flexible in its language
understanding.
For strongly handicapped persons, unable to use keyboards or touch screens,
speech understanding and dialogue is one of the main preliminaries. Moreover, we
have seen an improvement in the quality of the speech recognition technology in
recent years, and we foresee that this technology will be widely used in the coming
future as the technology further improves. It is also possible to operate the computer
using voice nowadays. However, there is a need to memorize those commands and
T. Tan, Y. Shi, and W. Gao (Eds.): ICMI 2000, LNCS 1948, pp. 16-23, 2000.
© Springer-Verlag Berlin Heidelberg 2000