Wavelet Transforms for Non-Uniform Speech Recognition Systems z Ldonard Janer, Josep Marti, Climent Nadeu Dept. TSC Universitat Polit2cnico de Catalunya 08034 Barcelona, Spain Eduanlo Lleida-Solano zyx GTC Dept. IEEC Centm Politdcnico zyxw Superior de Ingeniems 50015 zyxw Zamgoza, Spain Email: lwnard@gps.tsc.upc.es Email: lleida@mcps.unizar.es zyx ABSTRACT A new algorithm for non-uniform speech segmentation and its application in speech recognition systems is presented. A new method based on the Modulated Gaussian Wavelet 'Ikamform based Speech Analyser (MGWTSA) and the sub- sequent Parametrization block is used to transform a zyxwvu uni- formly signal into a set of non-uniformly separated frames, with the accurate information to be fed to our speech recog- nition system. Our algorithm wants to have a frame charac- terizing the signal where it is necessary, trying to reduce as much as possible the number of frames per signal, without an appreciable reduction in the recognition rate of the system. 1. Introduction In the last years, Wavelet Transform (WT) have been ap- plied in m e r e n t speech processing applications[4] [3] as an dcient front-end system taking advantages of their good time-frequency resolution. Most of those systems are spe- ech coding systems [2][7] or pitch detection systems [6][1][8]. Even though some speech recognition systems based on WT have been designed and tested [5], none of them tries to work with non-uniform parameters, as we zyxwvuts are doing. The work we present in this paper involves speech para- metrization using WT and speech recognition systems using Hidden Markov Models (HMM). The first step is the para- metrization: the speech signal is analysed using a Modulated Gaussian Wavelet transform analyser with 17 bands (scales) distributed on a Bark scale [6]. In this first step the signal is decomposed into 17 Merent temporal signals, each one with a different frequencial information, as they are decom- positions of the input speech at 17 scales. Actually these generated signals are taken sample by sample, but in a near future the system zyxwvutsrq will work with a less accurate precision. Once the signal is treated, we will examine the output of the analyser to detect instants of relevant information in the input, then we will take a frame at this time (composed by the 17 scales output samples) and finally, we will send it to 'This work has been supported by the Spanish Miniaery of Education and Sciences (MEC) grant TIC95-0884C04 the recognition system. In the second section we will explain the segmentation al- gorithms, that resolve which are those relevant instants of information. In the third section, the recognition models will be presented and in the following one results of some tests will be shown to appreciate the performance of both the segmentation algorithm and its application in a speech recognition task. In the last section the conclusions of this work will be detailed. 2. Speech Segmentation using Wavelets The segmentation step of our algorithm tries to detect rele vant points in the signal. The solutions we present in this paper work basically with the information on two of the 17 scales of the ouput of the Modulated Gaussian Wavelet Spe- ech Analyser. The two ones selected are number 3 and 9 (central frequency around 350-450 Hz. and 1170-137OHz. respectively). In the following paragraphs the Merent solu- tions are detailed, and their segmentation process shown in Figures 1 and 5 for model 1, Figures 2 and 6 for model 2 and Figures 3 and 7 for model 3, both for non-connnected and connected digits. Figure 1: Uniform Segmentation with interframe distance equal to 300 samples for the digit "1" male speaker "ae" 1. Uniformly separated frames: We take a frame with a constant time interval along the signal. This will be our reference model, for which the evaluation is shown in Figure 4 and in Table 1, in the case of Non-Connected Digits. 2340