SPECTRAL ESTIMATION FOR SPEECH SIGNALS BASED ON DECIMATION AND EIGENANALYSIS Pirros Tsiakoulis Student Member IEEE, Sotiris Karabetsos Student Member IEEE, Stavroula-Evita Fotinea, Ioannis Dologlou Abstract--This paper details on the application of a Decimative Spectral estimation method to speech signals in order to perform spectral analysis and estimation of Formant/Bandwidth values. The method is based on Eigenanalysis and SVD (Singular Value Decomposition) and performs artificial decimation for increased accuracy while it exploits the full set of data samples. The underlying model decomposes a signal into complex damped sinusoids whose frequencies, amplitudes, phases and damping factors are estimated. Correct estimation of Formant/Bandwidth values depend on the model order, thus the requested number of poles. Additionally, some selection criteria are applied regarding finer tracking and estimation of speech formants and their relevant bandwidths. Index Terms-- Spectral Estimation, Formants, SVD, Decimation, Speech Processing. I. INTRODUCTION Various applications in the field of digital signal processing, including speech processing [1] as well as spectroscopy, i.e. quantification of NMR signals, are employing complex damped sinusoidal models in order to represent a signal segment as a sum of exponentially damped complex-valued sinusoids [2][3]. The generalized model we use is given by, ∑ ∑ = + − = − = = = p i n i i n j d p i j i N n z g e e a n s i i 1 ) πf 2 ( 1 φ 1 ,..., 0 , ) ( ) ( i (1) where p is the number of complex damped sinusoids that comprise the measured signal, i g the complex amplitude and the signal poles. The objective is to estimate the frequencies, damping factors, amplitudes and phases. i z In spectrum estimation the use of decimation has played an important role to improve the resolution of the signal under consideration, prior to its quantification. The idea is to artificially move frequency peaks apart -ensuring no aliasing- prior to parameter estimation. Conventional decimation methods used straightforward downsampling of the data, thus, reducing the available data for further Institute for Language and Speech Processing (ILSP), Artemidos 6 & Epidavrou, Maroussi,. GR 151 25, Athens, Greece, Tel: +30-210-6875414, Fax: +30-210-6854270 Email: ptsiak@ilsp.gr processing. More modern methods overcome this inconvenience by using as many data points as possible. However, when employing the methodology we have adopted in this paper, we bring up the issue of data configuration and its importance in the overall performance. The method used here is called DESED (DEcimative Spectral Estimation by factor D) which has already been presented in [4] for decimation factor 2 and in [5] for the general case. The method performs decimation by any factor and it exploits the full data set whereas it is not obliged to reduce the dimensions of the Hankel matrix (no difference in elements of each antidiagonal) as D increases, allowing the use of dimension N/2 approximately. The advantage of DESED relies on the fact that it can benefit from the higher pole resolution obtained by decimation, while at the same time is not bound to use smaller dimensions of Hankel matrices, as other decimative approaches are. Moreover, DESED makes use of Singular Value Decomposition, while it is a generalization of the DESE2 method proposed in [5], which performs decimation by factor 2. This method, along with its TLS counterpart called DESED_TLS, have been successfully used in NMR spectroscopy, compared against methods that lie among the most promising ones for parameter estimation, that solve the same overdetermined system of equations [4]. The idea is to apply this method in the field of speech signal spectral estimation and furthermore to test if it is able to perform Formant/Bandwidth estimation. Consequently, a robust, accurate and representative parameter estimation technique can be used for feature extraction and proved to be valuable for application areas of speech synthesis and recognition. This paper presents the application of the DESED method in the field of speech signal spectral estimation, where the problem of formants and its respective bandwidth tracking is very important. The rest of this paper is organized as follows. In section II, the main algorithmic steps and notation of the DESED method are given. Section III explains the experimental methodology and provides the obtained results for speech spectral analysis and Formant/Bandwidth estimation on synthetic and real speech signals. Moreover, examples are given concerning formant