International Journal of Computer Applications (0975 – 8887) Volume 15– No.4, February 2011 28 Multi-Resolution Speech Spectrogram Rohini R. Mergu Dr.Shantanu K. Dixit Lecturer Professor & Head WIT, Solapur WIT, Solapur ABSTRACT An important aid in analysis & display of speech is sound spectrogram. It represents time-frequency-intensity display of short time spectrum. The quality of speech can be studied by visual inspection of spectrogram. This is one of the important applications of spectrogram in speech processing especially in speech enhancement. Another application of spectrogram is in isolating voiced and unvoiced regions. But to conclude from visual inspection the clarity of spectrogram is also important. Before plotting the spectrogram the time domain speech signal is converted to frequency domain. The transform domain used plays vital role in resolution of spectrogram. Generally Fast Fourier Transform is used to convert the time domain signal into frequency domain signal. This paper discusses the effect of using different transform for converting the time domain speech signal into frequency domain before plotting spectrogram. . It is observed that resolution of speech spectrogram is transform dependent. Keywords Spectrogram, Speech Enhancement, Speech Processing, Speech & Noise, Speech Quality, SNR, Resolution. 1. INTRODUCTION In many practical situations, speech has to be recorded in the presence of undesirable background noise. As noise often degrades the quality/intelligibility. In many practical situations, speech has to be recorded in the presence of undesirable background noise. As noise often degrades the quality/intelligibility of recorded speech, it is beneficial to carry out noise suppression. In the literature, a variety of speech enhancement methods capable of suppressing noise has been proposed. In speech enhancement the graphical representation of speech is spectrogram plays vital role to examine speech quality. The quality of speech can be observed quickly using spectrogram. This is one of the important applications of spectrogram in speech enhancement. Another application of spectrogram is in isolating voiced and unvoiced regions. But to conclude from visual inspection the clarity of spectrogram is also important. Before plotting the spectrogram the time domain speech signal is converted to frequency domain. The transform domain used plays vital role in resolution of spectrogram. Generally Fast Fourier Transform is used to convert the time domain signal into frequency domain signal. This paper discusses the effect of using different transform for converting the speech signal into frequency domain before plotting spectrogram. Zenton Goh, Kah-Chye Tan, and B.T.G.Tan [1] examined the spectrograms of typical clean speech, noisy speech, and enhanced speech. The horizontal axis of the spectrogram denotes time, vertical axis frequency, and the spectral magnitude is shown with gray shade (darker shade indicates larger value). It is observed that a large portion of the spectrogram is practically blank (i.e., unshaded) and the speech energy is concentrated in a few isolated regions. The voiced portion of speech is characterized by dark parallel “stripes” whereas unvoiced portion is characterized by gray patches. Some parallel stripes are horizontal while some are slanting up or down, indicating a change in the pitch of the speech signal. When white Gaussian noise amounting to the clean speech, the blank region of the spectrogram become shaded, and some of the stripes corresponding to voiced speech disappear. With an appropriate spectral subtraction, obtained an enhanced speech with spectrogram and observed a significant reduction of the unwanted short stripes. By observation of spectrogram [1] concluded about speech quality. S. Gannot, D. Burshtein, and Ehud Weinstein [6] presented a class of Kalman filter-based algorithms with some extensions, modifications, and improvements of previous work. The first algorithm employs the estimate-maximize (EM) method to iteratively estimate the spectral parameters of the speech and noise parameters. The enhanced speech signal is obtained as a byproduct of the parameter estimation algorithm. And used sound spectrogram for comparison of speech quality using Kalman-EM- Iterative (KEMI) algorithm and log spectral amplitude estimator (LSAE) algorithm. R.C.Hendriks, R.Heusdens, and J. Jensen [2] used a deterministic model in combination with the well-known stochastic models for speech enhancement. Thus derived a minimum mean-square error(MMSE) estimator under a combined stochastic–deterministic speech model with speech presence uncertainty and show that for different distributions of the DFT coefficients the combined stochastic–deterministic speech model leads to improved performance and used speech spectrogram for classification of speech component as deterministic or stochastic. Nicholas W.D. Evans, John S.Mason and Matt J. Roach [5] described the application of morphological filtering to speech spectrograms for noise robust automatic speech recognition. Speech regions of the spectrogram are identified based on the proximity of high energy regions to neighboring high energy regions in the three-dimensional space. H.Ding, I.Y.Soon, S.N.Koh,C.K.Yeo[4] proposed a hybrid Wiener spectrogram filter (HWSF) for effective noise reduction, followed by a multi-blade post-processor which exploits the 2D features of the spectrogram to preserve the speech quality and to further reduce the residual noise. Spectrogram comparisons show that in the proposed scheme, musical noise is significantly reduced. Cyril Plapous, Claude Marro, and Pascal Scalart [8] proposed a method called two-step noise reduction