1 Comparison of the Wavelet and Short Time Fourier Transforms for Spectral Analysis of Speech Signals Mohammad A. Tinati Behzad Mozaffary Faculty of Electrical and Computer Engineering Univercity of Tabriz 29 Bahman Blvd., Tabriz, East Azerbaijan IRAN Abstract— In mixtures of speech signals the energy content of the components of the mixture is important and determine the structure of the mixture. Energy contents of signals are better shown when time-frequency or time-scale planes are used. In this paper we present a comparison of wavelet transform (WT) and short time Fourier Transform (STFT) in spectral analysis of speech signals. We will show in wavelet domain, speech signals are very uncorrelated and sparsity of signal is increased. Keywords—STFT, WT, uncorrelated, sparsity, ICA 1 Introduction Blind source separation problem is relatively new and an important signal processing issue. It involves recovering unknown sources by using only mixtures of them [1]. Generally it is assumed that sources are statistically independent from each other and at most one of them could be Gaussian [2]. Recently, time-frequency representation (TFR) algorithms have been developed by many researchers [3] which in many cases could be considered as very powerful signal processing tools. In [1] and [4] wigner-ville representation is used to separate up to three speech signals from single observed mixture. They assumed that the time-frequency signatures of sources are disjoint. In [5] it is assumed that speech signals are windowed disjoint orthogonal in time- frequency and can separate speech sources from two mixtures of speech signals. In [6] and [7] it is assumed that speech and music representations are sparse and using frequency domain analysis, speech signal is separated from music. In [8] a solution for the blind source separation problem by shifting the problem to time-frequency domain and applying independent component analysis (ICA) algorithm is presented. In [9] using STFT, an algorithm is proposed for separation of heart beat cycles. Other time-frequency methods have been developed during the past decades applicable to different fields. One can find most of them with detailed references in [10], [11], [12], [13]. 2 Backgrounds 2.1 Time-Frequency In many applications such as speech processing, we are interested in the frequency content of a signal localized in time. The reason is that the signal parameters such as frequency content change over time. In other words these signals are non- stationary. For a non-stationary signal, s(t), the standard Fourier Transform is not useful for analyzing the signal. Information which is localized in time such as spikes and high frequency bursts cannot easily be detected from Fourier Transform. Time localization can be achieved by first windowing the signal so as to cut off only a well- localized slice of s(t) and then taking its Fourier Transform. This gives rise to the short time Fourier Transform or windowed Fourier Transform. The magnitude of the STFT is called spectrogram. The Short Time Fourier Transform of a signal s(t) using a window function w(t) is defined as : (, ) ( ( )) () ( ) S j t STFT s t stwt e dt ω τω τ −∞ = = (1) As the window w(t) slides along the signal s(t), for each shift τ, the usual Fourier Transform of the product function s(t)w(t-τ) is calculated. In two 5th WSEAS Int. Conf. on WAVELET ANALYSIS and MULTIRATE SYSTEMS, Sofia, Bulgaria, October 27-29, 2005 (pp31-35)