A gestural language recognition methodology for
human-robot interaction
Raoni M. Resende
ATAN Automation Systems,
Belo Horizonte, MG 30130-008, Brasil.
raoni.resende@atan.com.br
Guilherme A. S. Pereira
*
, Carlos A. Maia
*
, and Rodrigo L. Carceroni
†
*
Departamento de Engenharia El´ etrica,
†
Departamento de Ciˆ encia da Computac ¸˜ ao,
Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brasil.
gpereira@ufmg.br, maia@cpdee.ufmg.br, carceron@dcc.ufmg.br
Abstract— This paper presents the ongoing development of a
computer vision based human-robot interface. We propose the
use of efficient, well known computer vision algorithms and
context sensitive language recognition tools. In order to be robust
to the common misinterpretation in vision based sign recognition,
we subdivide a robot command into small visual symbols, easier
to identify than a complex one, and apply a finite-state machine
to interpret a sequence of such symbols. Since the output of our
vision system is stochastic we map this machine to a Markov
chain and use it to process the vision events and recognize a
command. Preliminary experimental results suggest that this
methodology will yield robust recognition with low ocurrence
of false positives.
I. I NTRODUCTION
Service and personal robotics are currently among the
most active fields of research in robotics. Some practical
applications of such robots are cleaning and housekeeping [1],
agriculture [2], medicine [3], searching and rescuing [4]. A
common characteristic of these applications is that the robots
are always near to the human user, interacting directly with
him/her, and augmenting his/her skills, providing a natural
synergism. In order to fulfill these requirements, besides a high
degree of autonomy to execute tasks without close supervision
of the operator, the robot must be provided with a friendly
human-robot interface that will allow the operator to give
commands in a natural and intuitive fashion.
Visual gesturing constitutes a natural way of communi-
cation. In robotics, computer vision gesture interpretation
presents advantages over traditional methods. Because no
special hardware or cables are required the operator is free to
move inside the field of view of the camera and even perform
other tasks while controlling the robot. The major drawbacks
of visual interfaces would be the difficulties inherited from
the computer vision algorithms themselves, which include for
example, the compromise between robustness and computation
efficiency.
Several authors have proposed different gesture recognition
methodologies. Bobick and Wilson [5] presented a state-based
approach where each gesture is modeled as a sequence of
states in a configuration space. Training data must be manually
segmented and temporally aligned. The data is represented by
a prototype curve that is parameterized according to a chosen
arc length. The prototype curve segments are used to define
fuzzy states representing the phases of each gesture. Hong et
al.[6] recognized dynamic gestures using finite state machines
(FSM) and a variation of the Knuth-Morris-Pratt algorithm.
The training data was segmented and the FSMs were built in
a semi-automatic way.
This paper presents the ongoing development of a computer
vision based human-robot interface. We propose the use of
efficient, well known computer vision algorithms and language
recognition tools. In our approach, simple motion gestures are
considered to be characters of an alphabet. As in a natural
language, sequences of such gestures constitute words, which
can be also put together to form a phrase. We consider that
words and phrases (but not characters) are commands to the
robot. Because we have a limited vocabulary, this allows us
to be robust to poor gesture (character) recognition, which is
very common in dynamical cluttered environments.
II. GESTURE AND COMMAND RECOGNITION
The algorithm proposed in this work can be split in two
main modules: Gesture Identification and Command Recog-
nition. These modules are based on character and language
recognition, respectively, and will be described in the next
subsections.
A. Gesture Identification
Computer vision methods for gesture recognition usually
present errors that are critical for some applications. This
work tries to solve this issue using a discrete event system
(DES) framework [7] to increase the interface robustness.
Also, focusing on this objective the interface is based on
simple motion gestures. The gesture identification module
identifies the gestures and associates this identification with
a probability. This information is transmitted to the command
recognition module, which processes each gesture according
to its probability and context to decide if any action must be
taken.
In this early work, the gesture (or direction of movement
as it is being considered) is identified by tracking an object
of interest, which is detected through color segmentation. Our
final goal is to improve this process in a way that will allow
any human, without the need of any object, colorful or not, to
issue commands naturally. Techniques such as Optical Flow
are being considered, even though preliminary tests have not
been successful.
To identify the gesture the following steps are performed:
1) Color segmentation;
2) Connected components labeling and identification of the
object of interest based on the blobs area and position;
Proceedings of the 2006 IEEE International Conference on Robotics and Automation
Orlando, Florida - May 2006
0-7803-9505-0/06/$20.00 ©2006 IEEE 4333