Recognition of Speech Commands Using a ModFed Neural Fuzzy Network and an Improved CA zyxwv K.F. Leung, F.H.F. Leung, H.K. Lam and P.K.S. Tam Centre for Multimedia Signal Processing, Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong Abstract: This paper presents the recognition of speech commands using a modified neural-fuzzy network. To train the parameters of the network, an improved genetic algorithm is proposed. zyxwvutsr As an application example, the proposed speech recognition approach is implemented in an Electronic Bonk experimentally to illustrate the design and its merits. zyxwvutsrqp 1. INTRODUCTION When we want to communicate with a machine using speeches, it is difficult to let the machine fully recognize our spoken words. In general, the solution to this problem involves two main procedures: feature extraction and classification. Feature extraction is a preprocessing procedure in a speech recognition system. It is used to extract the specific voice features from the speech signals. In a noise free environment, each word or phoneme has its corresponding formant frequencies. However, when the environment is noisy, the speech signals are impure, and it is difficult to identify their corresponding features. The problem becomes more complicated when the speeches to be recognized have close phonemes. Thus, researchers worked zyxwvutsrq on developing some distinctive feature extraction techniques. The most commonly used approaches are filter bank modeling and linear predictive coding (LPC) analysis [I]. Filter-bank modeling involves a bank of band-pass filters, which are used to model the characteristics of human ears. LPC analysis [I] approximates the current sampled speech as a linear combination of its past samples. The time-domain speech signals are first windowed into frames, and the autocorrelation coefficients between frames are obtained. This approach mimics the human vocal tract. Classification is the next procedure to identify the input speech based on the feature parameters. Speech signal classification can be done in either a pattem recognition approach or a statistical approach. Artificial neural networks (ANN) and hidden Markov model (HMM) are commonly employed in the pattem recognition approach and statistical approach respectively [I, 81. ANN is distinct in discrimination [SI, and the classification can be done by measuring the closeness of the testing template to the trained templates. However, a large number of mathematical operations will be required if the number of speech samples is large and the duration of the speech is long. HMMs are good at statistical modeling of continuous speech signals. The states in the HMM characterize the phonemes. The speech is formulated into a sequence of states. Cantonese digit speech recognition is a challenge task. Cantonese is a nine-tonal and syllabic language [Z]. Some digits are difficult to discriminate when they are spoken in Cantonese, such as the digits ‘1’ and ‘7’. Other human factors will introduce additional dificulties in obtaining a good The work described m this paper was fully supported by a grant from the Centre far Multimedia Signal Processing, The Hong Kong Polytechnic University (project number A432). performance in Cantonese speech recognition. Good algorithms for the speech feature extraction and classification are therefore important to give a high success rate on recognizing the spoken words. An electronic book (eBook) reader should have no keyboard or mouse. The main input device is a touch screen. As many functions are implemented in a single eBook Reader, it is not convenient to access these functions through menus and hot keys alone. By using a small microphone, a one-step commanding process using speeches is proposed for eBooks. To realize speech recognition for commanding, a modified neural fuzzy network (NFN) trained by an improved GA is proposed in this paper. The proposed NFN consists of two NFNs such that one NFN is responsible for providing the parameters of another NFN. In this way, the trained NFN will have dynamic parameters. Effectively, the rule base for each recognized pattem will change according to the pattem itself. On applying the proposed NFN, the performance of speech recognition is improved, and the time of training is shortened. The proposed training algorithm has the advantage of offering a global solution in a faster rate. In this paper, the modified NFN is used to recognize ten Cantonese digit speeches, and implemented in an eBook reader practically. 11. MODIFIEO NEURAL FUZZY NETWORK A modified neural fuzzy network is proposed to recognize speeches. Refemng to Fig. 1, the proposed NFN consists of two NFNs, namely a tuner NFN and a classifier NFN. The parameters of traditional NFNs are usually fixed afier the training. In the proposed NFN, some parameters of the classifier NFN are adjusted by the tuner NFN (which have fixed parameters afier training) to cope with the changing environment during the operation. For example, when there are two sets of input-output data, namely SI and S2, separated in a far distance within a large spatial domain shown in Fig. 2(a) and (b), it may be difficult for an NFN with fixed parameters and a limited number of rules to identify the features of the data. By using the proposed method, the rules of the classifier NFN are governed by the tuner NFN and are changed according to the network inputs. As a result, a set of input data will have a set of rules in the classifier NFN to handle them. We use a fuzzy associative memory (FAM) [3-4, IS] type of rule base for both the tuner and classifier NFNs. An FAM is formed by partitioning the universe of discourse of each fuzzy variable according to the level of fuzzy resolution chosen for the antecedents, thereby generating a grid of FAM elements. The entry at each grid element in the FAM corresponds to a fuzzy premise. An FAM is thus interpreted as a geometric or tabular representation of a fuzzy logic rule base. The tuner and classifier NFNs share the same structure as shown in Fig. 3. We define the input and output variables as zyx x, and zy fi respectively; where zyxwv i = zyxwv 1,2, ..., nio; n, is the number of input variables; j = I, 2, _.., n,,,; nor is the number of output variables. The behavior of yj of the NFN is governed by m, fuzzy rules of the following format; 0-7803-7810-5/03/$17.00 WO03 IEEE 190 The IEEE International Conference on Fuzzy Systems Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on July 7, 2009 at 00:46 from IEEE Xplore. Restrictions apply.