International Journal of Scientific & Engineering Research Volume 2, Issue 11, November-2011 1 ISSN 2229-5518 IJSER © 2011 http://www.ijser.org Optimal Wavelet for Bangla Vowel Synthesis Shahina Haque, Tomio Takara Abstract— Conventional methods uses Fourier Transform (FT) for Bangla vowel synthesis which has resolution problem. In order to produce better accuracy, we attempted Wavelet Transform (WT) with several wavelet families for analyzing and synthesizing the seven Bangla vowels. The parame- ters for performance evaluation for selecting optimal wavelet for Bangla phoneme synthesis are normalized root mean square error (NRMSE), signal to noise ratio (SNR), peak signal to noise ratio (PSNR), and retained energy (RE) of the first few coefficients of the first approximation decomposition. Our work is centered on the following wavelet families Daubechies, Coiflet, Symmlet, Biorthogonal and Reverse Biorthogonal. It is observed from our study that symmlet8(sym8) wavelet at decomposition level 5, stores more than 98% of the energy in the first few approximation coefficient with moderate SNR, PSNR and reproduces the signal with lowest NRMSE. Index Terms— Bangla vowels, Wavelet Transform, Daubechies, Coiflet, Symmlet, Biorthogonal, Reverse Biorthogonal ——————————  —————————— 1 INTRODUCTION ignal processing and filtering is, in its modest way, is an attempt to find a better form for a set of information, either by reshaping it or filtering out selected parts that are some- times labeled as noise. In other words, signal processing al- lows us to uncover a form of the signal that is closer to the true signal. Speech analysis systems generally carry out analysis which is usually obtained via time-frequency representations such as Short Time Fourier Transforms (STFTs) or Linear Pre- dictive Coding (LPC) techniques. In some respects, these me- thods may not be suitable for representing speech; as they as- sume signal stationarity within a given time frame and may therefore lack the ability to analyze localized events accurate- ly. Furthermore, the LPC approach assumes a particular linear (all-pole) model of speech production which strictly speaking is not the case. The main disadvantage of a Fourier expansion however, is that it has only frequency resolution and no time resolution [1]. This means that although all the frequencies present in a signal can be determined, the presence of distur- bances in time is not known. To overcome this problem, several solutions have been de- veloped to represent a signal in the time and frequency do- mains at the same time. The WT is one of the most recent solu- tions to overcome the shortcomings of the FT. In the wavelet analysis, the use of a fully scalable modulated window solves the signal-cutting problem. The window is shifted along the signal and for every position the spectrum is calculated. This process is then repeated many times with a slightly shorter or longer window for every new cycle. In the end, the result will be a collection of time-frequency representations of the signal, all with different resolutions. WT overcomes some of .these limitations; it can provide a constant-Q analysis of a given signal by projection onto a set of basis functions that are scale variant with frequency. Each wavelet is a shifted scaled ver- sion of an original or mother wavelet. These families are usually orthogonal to one another, important since this yields computational efficiency and ease of numerical implementa- tion. Other factors influencing the choice of WT over conven- tional methods include their ability to capture localized fea- tures. Also, developments aimed at generalization such as the Bat-Basis Paradigm of Coifinan and Wickerhauser [2] make for more flexible and useful representations. The indications are that the WT and its variants are useful in speech parameter extraction due to their good feature localization but further- more because more accurate (non-linear) speech production models can be assumed [3]. The adaptive nature of some exist- ing techniques results in a reduction of error due to speaker variation. Similarly, the continuous WT (CWT) is defined as the sum over all time of the signal multiplied by scaled, shifted versions of the wavelet function. In different languages, WT has been used for analyzing various speech corpora e.g. speech analysis, pitch detection, recognition, speech synthesis, speech segmentation [4,5,6,7,8] etc. But as far as it is known, no work has been reported yet on Bangla phoneme analysis and synthesis using WT. Therefore, we consider the possibility of providing WT based complete Bangla speech processing in the most compu- tationally efficient manner. As an initial stage of our work, we selected the seven Bangla vowel phonemes. We analyzed and synthesized the Bangla vowels using the widely used Daube- chies family of wavelets with WT. The organization of the paper is as follows. In section 2, theory of WT, wavelets, speech waveform decomposition and reconstruction using WT is discussed. Section 3 discusses about the procurement of the experimental data. Section 4 dis- cusses about application of the WT for Bangla phoneme analy- sis and synthesis. Then section 5 discusses about the result and performance evaluation of our experiment. Section 6 provides the conclusion and scope for future work. 2 WAVELETS AND SPEECH The fundamental idea behind wavelets is to analyse according to scale. The wavelet analysis procedure is to adopt a wavelet prototype function called an analysing wavelet or mother wavelet. Any signal can then be represented by translated and scaled versions of the mother wavelet. Wavelet analysis is ca- pable of revealing aspects of data that other signal analysis techniques such as Fourier analysis miss aspects like trends, breakdown points, discontinuities in higher derivatives, and S ————————————————  S.H. Author is with the Department of Electronics and Telecommuniaction Engineering, Daffodil International University, Dhaka, Bangladesh.  T.T. Author is with Faculty of Information Engineering, University of the Ryukyus, Okinawa, Japan.