A gestural language recognition methodology for human-robot interaction Raoni M. Resende ATAN Automation Systems, Belo Horizonte, MG 30130-008, Brasil. raoni.resende@atan.com.br Guilherme A. S. Pereira * , Carlos A. Maia * , and Rodrigo L. Carceroni † * Departamento de Engenharia El´ etrica, † Departamento de Ciˆ encia da Computac ¸˜ ao, Universidade Federal de Minas Gerais, Belo Horizonte, MG 31270-901, Brasil. gpereira@ufmg.br, maia@cpdee.ufmg.br, carceron@dcc.ufmg.br Abstract— This paper presents the ongoing development of a computer vision based human-robot interface. We propose the use of efﬁcient, well known computer vision algorithms and context sensitive language recognition tools. In order to be robust to the common misinterpretation in vision based sign recognition, we subdivide a robot command into small visual symbols, easier to identify than a complex one, and apply a ﬁnite-state machine to interpret a sequence of such symbols. Since the output of our vision system is stochastic we map this machine to a Markov chain and use it to process the vision events and recognize a command. Preliminary experimental results suggest that this methodology will yield robust recognition with low ocurrence of false positives. I. I NTRODUCTION Service and personal robotics are currently among the most active ﬁelds of research in robotics. Some practical applications of such robots are cleaning and housekeeping [1], agriculture [2], medicine [3], searching and rescuing [4]. A common characteristic of these applications is that the robots are always near to the human user, interacting directly with him/her, and augmenting his/her skills, providing a natural synergism. In order to fulﬁll these requirements, besides a high degree of autonomy to execute tasks without close supervision of the operator, the robot must be provided with a friendly human-robot interface that will allow the operator to give commands in a natural and intuitive fashion. Visual gesturing constitutes a natural way of communi- cation. In robotics, computer vision gesture interpretation presents advantages over traditional methods. Because no special hardware or cables are required the operator is free to move inside the ﬁeld of view of the camera and even perform other tasks while controlling the robot. The major drawbacks of visual interfaces would be the difﬁculties inherited from the computer vision algorithms themselves, which include for example, the compromise between robustness and computation efﬁciency. Several authors have proposed different gesture recognition methodologies. Bobick and Wilson [5] presented a state-based approach where each gesture is modeled as a sequence of states in a conﬁguration space. Training data must be manually segmented and temporally aligned. The data is represented by a prototype curve that is parameterized according to a chosen arc length. The prototype curve segments are used to deﬁne fuzzy states representing the phases of each gesture. Hong et al.[6] recognized dynamic gestures using ﬁnite state machines (FSM) and a variation of the Knuth-Morris-Pratt algorithm. The training data was segmented and the FSMs were built in a semi-automatic way. This paper presents the ongoing development of a computer vision based human-robot interface. We propose the use of efﬁcient, well known computer vision algorithms and language recognition tools. In our approach, simple motion gestures are considered to be characters of an alphabet. As in a natural language, sequences of such gestures constitute words, which can be also put together to form a phrase. We consider that words and phrases (but not characters) are commands to the robot. Because we have a limited vocabulary, this allows us to be robust to poor gesture (character) recognition, which is very common in dynamical cluttered environments. II. GESTURE AND COMMAND RECOGNITION The algorithm proposed in this work can be split in two main modules: Gesture Identiﬁcation and Command Recog- nition. These modules are based on character and language recognition, respectively, and will be described in the next subsections. A. Gesture Identiﬁcation Computer vision methods for gesture recognition usually present errors that are critical for some applications. This work tries to solve this issue using a discrete event system (DES) framework [7] to increase the interface robustness. Also, focusing on this objective the interface is based on simple motion gestures. The gesture identiﬁcation module identiﬁes the gestures and associates this identiﬁcation with a probability. This information is transmitted to the command recognition module, which processes each gesture according to its probability and context to decide if any action must be taken. In this early work, the gesture (or direction of movement as it is being considered) is identiﬁed by tracking an object of interest, which is detected through color segmentation. Our ﬁnal goal is to improve this process in a way that will allow any human, without the need of any object, colorful or not, to issue commands naturally. Techniques such as Optical Flow are being considered, even though preliminary tests have not been successful. To identify the gesture the following steps are performed: 1) Color segmentation; 2) Connected components labeling and identiﬁcation of the object of interest based on the blobs area and position; Proceedings of the 2006 IEEE International Conference on Robotics and Automation Orlando, Florida - May 2006 0-7803-9505-0/06/$20.00 ©2006 IEEE 4333