Hybrid Neural-Based Guiding System
for Mobile Robots
P. Sánchez, P. Melin, M. A. López
Division of Research and Graduate Studies
Tijuana Institute of Technology, Tijuana, Mexico
PatriciaSanchez@ieee.org , pmelin@tectijuana.mx , mlopez@tectijuana.mx
Abstract – A hybrid system is a dynamical system with both
discrete and continuous state changes such as those that combine
neural networks and fuzzy logic. In this paper, we propose a
method for voice and image recognition by implementing
optimized neural networks and fuzzy logic to guide a distributed
robot. Generally, word recognition systems are divided into three
stages: segmentation, feature extraction and classification. We
use a computer vision method for feature extraction, which is
known as the Mel Frequency Cepstral Coefficients (MFCC).
Genetic Algorithms (GA) are used for the optimization process in
order to improve image recognition. The robot’s world is a white
square area measuring 2 square meters, the robot receives a
voice request for a geometric solid and it must search between
the different solids to find the one asked for. After this it must
direct itself to the solid using a fuzzy guiding system.
I. INTRODUCTION
This paper describes a hybrid neural-based guiding system for
mobile robots that takes advantage of combining soft
computing techniques, such as computer vision and genetic
algorithms.
Pattern recognition has been studied and related to many
different (and mainly unrelated) applications, such as:
classifying galaxies by shape, identifying fingerprints or
speech recognition.
Human expertise in these and many similar problems is
being supplemented by computer-based procedures, especially
artificial neural networks (ANNs). Pattern recognition is
extremely widely used, often under the names of
“classification”, “diagnosis” or “learning from examples” [1].
ANNs attempt to replicate the computational power (low
level arithmetic processing ability) of biological neural
networks and, thereby, hopefully endow machines with some
of the (higher-level) cognitive abilities that biological
organisms possess (due in part, perhaps, to their low-level
computational prowess).
Nevertheless, an impediment to a more widespread
acceptance of ANNs is the absence of a capability to explain
to the user, in a human-comprehensible form, how the network
arrives at a particular decision. Neither can one say something
about the knowledge encoded within the black-box. Recently,
there has been widespread activity aimed at addressing this
situation by extracting the embedded knowledge in trained
ANNs in the form of symbolic rules [2]-[4].
MFCCs improve the speech recognition ratio on ANNs,
they are coefficients that represent audio. They are derived
from a type of cepstral representation of the audio clip (a
"spectrum-of-a-spectrum"). The frequency bands are
positioned logarithmically (on the Mel scale) which
approximates the human auditory system's response more
closely than the linearly-spaced frequency bands obtained
directly from the Fast Fourier Transformed (FFT).
This allows a better processing of data, for example, in
audio compression [5]. For this reason we include this type of
processing for speech recognition ANN.
One of the main difficulties of creating ANNs if to find
the accurate set of parameters and topology, and for this
matter, one of the most commonly used optimization method
is GA. The use of both, genetic algorithms and artificial neural
networks was originally motivated by the astonishing success
of these concepts in their biological counterparts.
A GA, essentially, is a method of "breeding" computer
programs and solutions for optimization of search problems by
means of simulated evolution. Processes based on natural
selection, crossover, and mutation are repeatedly applied to a
population of binary strings which represent potential
solutions. Over time, the number of above-average
individual’s increases, and better fit individuals are created,
until a good solution to the problem at hand is found [6], [7].
Fuzzy logic, allows for set membership values to range
(inclusively) between 0 and 1, and in its linguistic form,
imprecise concepts like "slightly", "quite" and "very". It
allows partial membership in a set, related to fuzzy sets,
possibility theory [8].
II. ARTIFICIAL NEURAL NETWORKS FOR PATTERN
RECOGNITION
Humans use sensor-motor information to communicate, and
the use of this information is essential for a friendly, assertive
and precise interaction. This means that we need to create
friendly computers and with more natural interfaces [9].
Speech recognition is a well studied field, whereas, it still
needs to be improved and implemented in different ways.
Nowadays there are robots that can interact with people i.e.
Papero [10].
Nonetheless, many real world problems require a degree
of flexibility that is difficult to achieve using hand
programmed algorithms. One such domain is vision-based,
primarily associated to the images; ANNs are mainly used for
character, voice and face recognition. In this paper is used for
speech and image recognition [11].
The purpose of this part of the work is to make a real time
voice request to the computer; the computer receives audio,
processes it and decides what solid is, through the ANN
978-1-4244-2352-1/08/$25.00 ©2008 IEEE
Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 12, 2010 at 18:06 from IEEE Xplore. Restrictions apply.