Hybrid Neural-Based Guiding System for Mobile Robots P. Sánchez, P. Melin, M. A. López Division of Research and Graduate Studies Tijuana Institute of Technology, Tijuana, Mexico PatriciaSanchez@ieee.org , pmelin@tectijuana.mx , mlopez@tectijuana.mx Abstract – A hybrid system is a dynamical system with both discrete and continuous state changes such as those that combine neural networks and fuzzy logic. In this paper, we propose a method for voice and image recognition by implementing optimized neural networks and fuzzy logic to guide a distributed robot. Generally, word recognition systems are divided into three stages: segmentation, feature extraction and classification. We use a computer vision method for feature extraction, which is known as the Mel Frequency Cepstral Coefficients (MFCC). Genetic Algorithms (GA) are used for the optimization process in order to improve image recognition. The robot’s world is a white square area measuring 2 square meters, the robot receives a voice request for a geometric solid and it must search between the different solids to find the one asked for. After this it must direct itself to the solid using a fuzzy guiding system. I. INTRODUCTION This paper describes a hybrid neural-based guiding system for mobile robots that takes advantage of combining soft computing techniques, such as computer vision and genetic algorithms. Pattern recognition has been studied and related to many different (and mainly unrelated) applications, such as: classifying galaxies by shape, identifying fingerprints or speech recognition. Human expertise in these and many similar problems is being supplemented by computer-based procedures, especially artificial neural networks (ANNs). Pattern recognition is extremely widely used, often under the names of “classification”, “diagnosis” or “learning from examples” [1]. ANNs attempt to replicate the computational power (low level arithmetic processing ability) of biological neural networks and, thereby, hopefully endow machines with some of the (higher-level) cognitive abilities that biological organisms possess (due in part, perhaps, to their low-level computational prowess). Nevertheless, an impediment to a more widespread acceptance of ANNs is the absence of a capability to explain to the user, in a human-comprehensible form, how the network arrives at a particular decision. Neither can one say something about the knowledge encoded within the black-box. Recently, there has been widespread activity aimed at addressing this situation by extracting the embedded knowledge in trained ANNs in the form of symbolic rules [2]-[4]. MFCCs improve the speech recognition ratio on ANNs, they are coefficients that represent audio. They are derived from a type of cepstral representation of the audio clip (a "spectrum-of-a-spectrum"). The frequency bands are positioned logarithmically (on the Mel scale) which approximates the human auditory system's response more closely than the linearly-spaced frequency bands obtained directly from the Fast Fourier Transformed (FFT). This allows a better processing of data, for example, in audio compression [5]. For this reason we include this type of processing for speech recognition ANN. One of the main difficulties of creating ANNs if to find the accurate set of parameters and topology, and for this matter, one of the most commonly used optimization method is GA. The use of both, genetic algorithms and artificial neural networks was originally motivated by the astonishing success of these concepts in their biological counterparts. A GA, essentially, is a method of "breeding" computer programs and solutions for optimization of search problems by means of simulated evolution. Processes based on natural selection, crossover, and mutation are repeatedly applied to a population of binary strings which represent potential solutions. Over time, the number of above-average individual’s increases, and better fit individuals are created, until a good solution to the problem at hand is found [6], [7]. Fuzzy logic, allows for set membership values to range (inclusively) between 0 and 1, and in its linguistic form, imprecise concepts like "slightly", "quite" and "very". It allows partial membership in a set, related to fuzzy sets, possibility theory [8]. II. ARTIFICIAL NEURAL NETWORKS FOR PATTERN RECOGNITION Humans use sensor-motor information to communicate, and the use of this information is essential for a friendly, assertive and precise interaction. This means that we need to create friendly computers and with more natural interfaces [9]. Speech recognition is a well studied field, whereas, it still needs to be improved and implemented in different ways. Nowadays there are robots that can interact with people i.e. Papero [10]. Nonetheless, many real world problems require a degree of flexibility that is difficult to achieve using hand programmed algorithms. One such domain is vision-based, primarily associated to the images; ANNs are mainly used for character, voice and face recognition. In this paper is used for speech and image recognition [11]. The purpose of this part of the work is to make a real time voice request to the computer; the computer receives audio, processes it and decides what solid is, through the ANN 978-1-4244-2352-1/08/$25.00 ©2008 IEEE Authorized licensed use limited to: UNIVERSITY OF WINDSOR. Downloaded on February 12, 2010 at 18:06 from IEEE Xplore. Restrictions apply.