International Journal of Scientific & Engineering Research Volume 2, Issue 11, November-2011 1
ISSN 2229-5518
IJSER © 2011
http://www.ijser.org
Optimal Wavelet for Bangla Vowel Synthesis
Shahina Haque, Tomio Takara
Abstract— Conventional methods uses Fourier Transform (FT) for Bangla vowel synthesis which has resolution problem. In order to produce better
accuracy, we attempted Wavelet Transform (WT) with several wavelet families for analyzing and synthesizing the seven Bangla vowels. The parame-
ters for performance evaluation for selecting optimal wavelet for Bangla phoneme synthesis are normalized root mean square error (NRMSE), signal to
noise ratio (SNR), peak signal to noise ratio (PSNR), and retained energy (RE) of the first few coefficients of the first approximation decomposition. Our
work is centered on the following wavelet families Daubechies, Coiflet, Symmlet, Biorthogonal and Reverse Biorthogonal. It is observed
from our study that symmlet8(sym8) wavelet at decomposition level 5, stores more than 98% of the energy in the first few approximation coefficient with
moderate SNR, PSNR and reproduces the signal with lowest NRMSE.
Index Terms— Bangla vowels, Wavelet Transform, Daubechies, Coiflet, Symmlet, Biorthogonal, Reverse Biorthogonal
—————————— ——————————
1 INTRODUCTION
ignal processing and filtering is, in its modest way, is an
attempt to find a better form for a set of information, either
by reshaping it or filtering out selected parts that are some-
times labeled as noise. In other words, signal processing al-
lows us to uncover a form of the signal that is closer to the true
signal. Speech analysis systems generally carry out analysis
which is usually obtained via time-frequency representations
such as Short Time Fourier Transforms (STFTs) or Linear Pre-
dictive Coding (LPC) techniques. In some respects, these me-
thods may not be suitable for representing speech; as they as-
sume signal stationarity within a given time frame and may
therefore lack the ability to analyze localized events accurate-
ly. Furthermore, the LPC approach assumes a particular linear
(all-pole) model of speech production which strictly speaking
is not the case. The main disadvantage of a Fourier expansion
however, is that it has only frequency resolution and no time
resolution [1]. This means that although all the frequencies
present in a signal can be determined, the presence of distur-
bances in time is not known.
To overcome this problem, several solutions have been de-
veloped to represent a signal in the time and frequency do-
mains at the same time. The WT is one of the most recent solu-
tions to overcome the shortcomings of the FT. In the wavelet
analysis, the use of a fully scalable modulated window solves
the signal-cutting problem. The window is shifted along the
signal and for every position the spectrum is calculated. This
process is then repeated many times with a slightly shorter or
longer window for every new cycle. In the end, the result will
be a collection of time-frequency representations of the signal,
all with different resolutions. WT overcomes some of .these
limitations; it can provide a constant-Q analysis of a given
signal by projection onto a set of basis functions that are scale
variant with frequency. Each wavelet is a shifted scaled ver-
sion of an original or mother wavelet. These families are
usually orthogonal to one another, important since this yields
computational efficiency and ease of numerical implementa-
tion. Other factors influencing the choice of WT over conven-
tional methods include their ability to capture localized fea-
tures. Also, developments aimed at generalization such as the
Bat-Basis Paradigm of Coifinan and Wickerhauser [2] make for
more flexible and useful representations. The indications are
that the WT and its variants are useful in speech parameter
extraction due to their good feature localization but further-
more because more accurate (non-linear) speech production
models can be assumed [3]. The adaptive nature of some exist-
ing techniques results in a reduction of error due to speaker
variation. Similarly, the continuous WT (CWT) is defined as
the sum over all time of the signal multiplied by scaled,
shifted versions of the wavelet function.
In different languages, WT has been used for analyzing
various speech corpora e.g. speech analysis, pitch detection,
recognition, speech synthesis, speech segmentation [4,5,6,7,8]
etc. But as far as it is known, no work has been reported yet on
Bangla phoneme analysis and synthesis using WT.
Therefore, we consider the possibility of providing WT
based complete Bangla speech processing in the most compu-
tationally efficient manner. As an initial stage of our work, we
selected the seven Bangla vowel phonemes. We analyzed and
synthesized the Bangla vowels using the widely used Daube-
chies family of wavelets with WT.
The organization of the paper is as follows. In section 2,
theory of WT, wavelets, speech waveform decomposition and
reconstruction using WT is discussed. Section 3 discusses
about the procurement of the experimental data. Section 4 dis-
cusses about application of the WT for Bangla phoneme analy-
sis and synthesis. Then section 5 discusses about the result and
performance evaluation of our experiment. Section 6 provides
the conclusion and scope for future work.
2 WAVELETS AND SPEECH
The fundamental idea behind wavelets is to analyse according
to scale. The wavelet analysis procedure is to adopt a wavelet
prototype function called an analysing wavelet or mother
wavelet. Any signal can then be represented by translated and
scaled versions of the mother wavelet. Wavelet analysis is ca-
pable of revealing aspects of data that other signal analysis
techniques such as Fourier analysis miss aspects like trends,
breakdown points, discontinuities in higher derivatives, and
S
————————————————
S.H. Author is with the Department of Electronics and Telecommuniaction
Engineering, Daffodil International University, Dhaka, Bangladesh.
T.T. Author is with Faculty of Information Engineering, University of the
Ryukyus, Okinawa, Japan.