NEURAL NETWORKS FOR VOICE RECOGNITION Geok See Ng, S. S. Erdogan, Pan Wei Ng Nanyang Technological University, School of Applied Science Division of Computer Technology, Nanyang Avenue, Singapore 2263 In this paper, we present the CN, the motivation for its development, and how it works. We compare its performance with RCE, LVQ and DVQ using some Artificially Generated Vectors ( AGVs ). Next the CN is applied on real speech feature vectors. ABSTRACT Artificial Neural Networks (ANNs) have been used to perform classification for Automatic Speech Recognition (ASR). In this paper, we propose a new neural network, the Contenders' Network (CN) which requires little initial knowledge of the classification problem and lesser neurons than other ANNs. 2 CURRENT NETWORK CLASSIFIERS ANNs are widely used in ASR. However, many of them, especially those that use the Backpropagation [5] learning algorithm, requires excessive training time. Hence, some ANNs that have a relatively shorter training time are chosen for comparison in this paper. These ANNs are LVQ, DVQ and RCE. Another reason why these ANNs are chosen is that they have similar architectures. This allows us to train the ANN using the learning algorithm of network, and perform classification using another. Each architecture has 3 layers: one input, one internal, and one output (see Figure 1). The number of input neurons corresponds to the number of elements in the input feature vector. The internal neurons are the reference patterns in the feature space from which comparisons according to some distance measures are made. The output neurons correspond to the various classes in the feature space. The input and the internal layer is fully connected. Each internal neuron has one and only one link to an output neuron. The network has only feedforward connections. However, all of them uses supervised learning. 1. INTRODUCTION ANNs is a fast emerging technology. Its ability to compute complex decision surfaces and its numerous processing elements have given it the ability to classify objects and make complex decisions. We apply this technology in the area of Automatic Speech Recognition (ASR). Although there are numerous techniques that have been developed, ASR is still not widely used due to its cost. One of the objectives of our work is to make the process of ASR cheap, fast and readily accessable. The hardware that we use is a simple 8-bit ADC/DAC from a SoundBlaster Pro card on a 80486 PC. Experiments are conducted in environments with some degree of background noise on spoken Mandarin digits and restricted to isolated word recognition. Our approach is to develop a firm foundation of reusable software routines to perform speech signal processing and to use them with some ANNs to perform phoneme classification and word classification. We implemented some signal processing routines to perform feature extraction. Other paradigms such as Restricted Coulomb Energy (RCE) [4], Learning Vector Quantization (LVQ) [1] and Dynamic Vector Quantization (DVQ) [2] are investigated and compared with the proposed Contenders' Network (CN). The RCE network creates a hyper- sphere with an initial radius when an input Each internal node c to one reference patt Input layer corresponds to input vector Output layer corresponds to class of feature spac Figure 1. Neural network architecture. For demonstration, a square puzzle game is built. This game uses speech as input. This game is run in real-time. It has an accuracy of 93.6% with 12 commands. The response time is 6 seconds per command.