WAVELET DECOMPOSITION OF VOICED SPEECH AND MATHEMATICAL MORPHOLOGY ANALYSIS FOR GLOTTAL CLOSURE INSTANTS DETECTION Amel Ben Slimane Rahmouni (1),(2) , Aicha Bouzid (1),(3) , Noureddine Ellouze (1) (1) ENIT (LSTS), (2) ESSTT, (3) ISET Sfax BP. 37, Le Belv´ ed` ere 1002 Tunis, Tunisie Tel: (216) 71874700; fax: (216) 71872729 e-mail: Amel.Rahmouni@esstt.rnu.tn, Aicha.bouzid@enit.rnu.tn ABSTRACT This paper presents a robust algorithm for glottal clo- sure instants (GCIs) detection of speech signals. The algorithm uses a multi-scale analysis based on a dyadic wavelet ﬁlterbank. Signiﬁcant minima and maxima of the ﬁltered signals are localized at each scale using adap- tive mathematical morphology transformation of ero- sion. With reference to the GCIs detected from the laryngograph signal, a robust strategy for GCI localiza- tion was deduced. Each GCI is determined as the posi- tion of a minimum suitably chosen on one of the outputs of the diﬀerent ﬁlters. This choice aims to ensure the best accuracy and reliability even for weak glottal eﬀort. 1 INTRODUCTION Pitch detection research shows a great interest in ana- lyzing voiced speech period by period over an interval delimited by two successive instants of glottal closure. The glottal closure instants carry important information on the speech signal. Prosodic parameters like voicing degree and voicing frequency (F 0 ) can be derived from glottal closure instants. Eﬃcient detection and estima- tion of pitch has many applications in audio signal pro- cessing. For example, pitch is very useful in speech pro- cessing applications as speech and language recognition, speaker identiﬁcation and speech synthesis. So as deter- mination of GCIs allows pitch synchronous processing of speech signals. Glottal closure instants are often points of sharp vari- ations or singularities in the speech signal. According to Mallat [1], the wavelet transform demonstrated ex- cellent capabilities for detection of singularities in sig- nals. Furthermore, in the last years, wavelet transforms have been intensively applied in diﬀerent pitch detec- tion algorithms [2] [3] [4]. Most of those algorithms are based on the dyadic wavelet transform, what it means a constant dilation factor equal to 2. Vu Ngoc [5] pro- poses speech representation in the time-scale domain by wavelet transform and a ﬁlterbank implementation. The main idea presented is that all dyadic scales are used for speech analysis. As a result, not only high frequency features are analyzed with accuracy but also smooth singularities in the signal can be detected. The present work, explores similar concept and proposes a robust strategy for glottal closure instants detection. The pro- posed strategy uses signiﬁcant minima and maxima time localization of the ﬁlterbank outputs. A speciﬁc ero- sion mathematical morphology transformation is used for minima and maxima detection. The proposed algo- rithm takes decision from diﬀerent scale minima giving the best estimation of the GCIs. This paper is organized as follows. After an introduc- tion, we describe in section 2 the dyadic wavelet trans- form used to analyze speech signals. Then, section 3 focuses on the peaks detection algorithm. In section 4, we present the strategy of GCI determination and some experimental results. Finally, concluding remarks are given in section 5. 2 DYADIC WAVELET TRANSFORM OF SPEECH SIGNAL Wavelet transform is a powerful mathematical tool for hierarchical function decomposition . It allows a func- tion to be described in terms of a coarse over all shape, plus details that range from broad to narrow. So it of- fers an elegant technique for representing diﬀerent lev- els of details. Signal characteristics can be eﬃciently located in the space and frequency domains. Thus, un- like the Short Time Fourier Transform (STFT), wavelets are adequate for the study of non-stationary and un- predictable signals with both low frequency components and sharp transitions. The wavelet transform is a multi- resolutional and multi-scale analysis which has been shown to be very well suited for speech processing. We used a speciﬁc ﬁlterbank [5] in order to make a multi-scale analysis of the voiced speech signal. The mother wavelet g(t) used is given by equation 1 : g(t)= - cos(2πf 0 t).exp(-t 2 /2τ 2 ). (1) The transformed signal y i (t) at scale i of x(t) is given by equation 2 : y i (t)= x(t) * g(t/s i ). (2)