WAVELET DECOMPOSITION OF VOICED SPEECH AND MATHEMATICAL MORPHOLOGY ANALYSIS FOR GLOTTAL CLOSURE INSTANTS DETECTION Amel Ben Slimane Rahmouni (1),(2) , Aicha Bouzid (1),(3) , Noureddine Ellouze (1) (1) ENIT (LSTS), (2) ESSTT, (3) ISET Sfax BP. 37, Le Belv´ ed` ere 1002 Tunis, Tunisie Tel: (216) 71874700; fax: (216) 71872729 e-mail: Amel.Rahmouni@esstt.rnu.tn, Aicha.bouzid@enit.rnu.tn ABSTRACT This paper presents a robust algorithm for glottal clo- sure instants (GCIs) detection of speech signals. The algorithm uses a multi-scale analysis based on a dyadic wavelet filterbank. Significant minima and maxima of the filtered signals are localized at each scale using adap- tive mathematical morphology transformation of ero- sion. With reference to the GCIs detected from the laryngograph signal, a robust strategy for GCI localiza- tion was deduced. Each GCI is determined as the posi- tion of a minimum suitably chosen on one of the outputs of the different filters. This choice aims to ensure the best accuracy and reliability even for weak glottal effort. 1 INTRODUCTION Pitch detection research shows a great interest in ana- lyzing voiced speech period by period over an interval delimited by two successive instants of glottal closure. The glottal closure instants carry important information on the speech signal. Prosodic parameters like voicing degree and voicing frequency (F 0 ) can be derived from glottal closure instants. Efficient detection and estima- tion of pitch has many applications in audio signal pro- cessing. For example, pitch is very useful in speech pro- cessing applications as speech and language recognition, speaker identification and speech synthesis. So as deter- mination of GCIs allows pitch synchronous processing of speech signals. Glottal closure instants are often points of sharp vari- ations or singularities in the speech signal. According to Mallat [1], the wavelet transform demonstrated ex- cellent capabilities for detection of singularities in sig- nals. Furthermore, in the last years, wavelet transforms have been intensively applied in different pitch detec- tion algorithms [2] [3] [4]. Most of those algorithms are based on the dyadic wavelet transform, what it means a constant dilation factor equal to 2. Vu Ngoc [5] pro- poses speech representation in the time-scale domain by wavelet transform and a filterbank implementation. The main idea presented is that all dyadic scales are used for speech analysis. As a result, not only high frequency features are analyzed with accuracy but also smooth singularities in the signal can be detected. The present work, explores similar concept and proposes a robust strategy for glottal closure instants detection. The pro- posed strategy uses significant minima and maxima time localization of the filterbank outputs. A specific ero- sion mathematical morphology transformation is used for minima and maxima detection. The proposed algo- rithm takes decision from different scale minima giving the best estimation of the GCIs. This paper is organized as follows. After an introduc- tion, we describe in section 2 the dyadic wavelet trans- form used to analyze speech signals. Then, section 3 focuses on the peaks detection algorithm. In section 4, we present the strategy of GCI determination and some experimental results. Finally, concluding remarks are given in section 5. 2 DYADIC WAVELET TRANSFORM OF SPEECH SIGNAL Wavelet transform is a powerful mathematical tool for hierarchical function decomposition . It allows a func- tion to be described in terms of a coarse over all shape, plus details that range from broad to narrow. So it of- fers an elegant technique for representing different lev- els of details. Signal characteristics can be efficiently located in the space and frequency domains. Thus, un- like the Short Time Fourier Transform (STFT), wavelets are adequate for the study of non-stationary and un- predictable signals with both low frequency components and sharp transitions. The wavelet transform is a multi- resolutional and multi-scale analysis which has been shown to be very well suited for speech processing. We used a specific filterbank [5] in order to make a multi-scale analysis of the voiced speech signal. The mother wavelet g(t) used is given by equation 1 : g(t)= - cos(2πf 0 t).exp(-t 2 /2τ 2 ). (1) The transformed signal y i (t) at scale i of x(t) is given by equation 2 : y i (t)= x(t) * g(t/s i ). (2)