Speech Enhancement using MMSE Estimation and Spectral Subtraction Methods V.K.Gupta 1 , Anirban Bhowmick 2 , Mahesh Chandra 3 , S.N.Saran 4 Electronics and communication Department 1, 2, 3 , Director GNIT, Greater Noida 4 BIT, Mesra, Ranchi, India 1, 2, 3 , GNIT, Greater Noida 4 , guptavk76@gmail.com, talktorahul1509@gmail.com, shrotriya69@rediffmail.com, snathsharan@yahoo.com Abstract—Efficiency of the speech recognition system in noise free environment is impressive but in the presence of environmental noise the efficiency of the speech recognition system deteriorates drastically. Environmental noise also affects human-to-human or human-to-machine communications and degrades the speech quality as well as intelligibility. Here a speech recognition system is proposed in presence of noisy environment. Database of ten Hindi digits was prepared for fifty speakers. Two types of noises, Speech and F16 noises were used to make the noisy database at different Signal-to-Noise Ratio (SNR) levels (-5dB, 0dB, 5dB, 10dB). Spectral estimation techniques like Spectral Subtraction (SS) and Minimum Mean Square Error (MMSE) estimation based methods were used for de-noising the speech before feature extraction. Mel Frequency Cepstral Coefficient (MFCC) and Hidden Markov Model (HMM) were used as feature extraction technique and classifier respectively. Multi-band SS de-noising approach has shown best recognition results as compared to all other techniques for both types of noises. Keywords-MMSE; Spectral Subtraction; MFCC; HMM. I. INTRODUCTION Speech quality is degraded in the presence of acoustic noise. The degradation depends on the characteristics of the noise and environment. Speech enhancement algorithms improve the quality of the speech and reduce or eliminate the acoustic noise. In the past few decades different speech enhancement algorithms were studied and developed. When the clean speech is degraded by the noise it can be represented as: (1) In equation (1) v(n) is the unwanted additive noise, which is assumed to be zero mean random process and uncorrelated with x(n). For speech recognition in noisy environment x(n) has to be estimated from the noise corrupted signal y(n). There are several methods to estimate x(n) or to reduce v(n). These techniques can be broadly categorized as (i) Spectral amplitude estimation such as Wiener filtering, Spectral subtraction [1][2][3], MMSE Estimation [4], Geometric approach based spectral subtraction [5] and Log spectral amplitude (LSA) estimation [6] (ii) Speech production model- based method [7] (iii) Hearing perceptual criteria based enhancement [8][9][10] (iv) Text-directed non-real-time speech enhancement [11] (v) Hidden Markov model (HMM) method [12] and (vi) Eigen decomposition subspace method [13]. In this paper two noises were added to clean speech signal to get noisy speech signal at different SNR levels (-5dB, 0dB, 5dB, 10dB). Six existing speech enhancement algorithms have been implemented which are broadly divided into two categories; spectral subtraction methods and MMSE based estimation methods. The spectral subtraction techniques are based on spectral subtraction proposed by Berouti et al (BSS) [14], geometric approach to spectral subtraction (GA_SS) proposed by Yang Lu, Philipos C. Loizou [5] and a spectral subtraction technique based on dividing the spectrum into a few contiguous frequency bands and applying different non-linear rules in each band by Kamath and Loizou (KSS) [15]. The MMSE estimation techniques are based on speech enhancement using a Minimum Mean-Square Error Short Time Spectral Amplitude (STSA) log estimator by Ephraim and Malah (MMSE STSA log) [6]. The other two techniques are based on same method proposed by Israel Cohen [16]. In one technique, Israel Cohen proposed method of speech enhancement using a non-causal a priori SNR estimator where log spectral MMSE (MMSE cohen log) estimator was used to calculate the MMSE gain [6]. In this paper in another technique MMSE gain is also calculated with MMSE estimator (MMSE cohen) [4]. I. SPEECH ENHANCEMENT TECHNIQUES A. Spectral Subtraction Techniques The spectral subtraction approach can be categorized as a nonparametric approach, in which the noise spectrum is estimated. The noise spectrum is estimated during periods of speaker silence. From equation (1) x(n) is the clean signal and v(n) is the noise and they can be assumed uncorrelated. So, the spectral subtraction approach can be used to estimate the short term magnitude spectrum of the clean signal |X(ω)| by the subtraction of the estimated noise magnitude spectrum (ω) from the noisy signal magnitude spectrum |X(ω)|. It is sufficient to use the noisy signal phase spectrum as an estimate of the clean speech phase (|| | |) (2) The estimated time-domain speech signal is obtained as the inverse Fourier transform of (ω) [17]. Another way to recover the clean signal x(n) from the noisy signal y(n) is performed by estimating the power spectrum of the noise (ω). Power spectrum of the noise is obtained by averaging