Int. J. Electron. Commun. (AEÜ) 66 (2012) 459–464 Contents lists available at SciVerse ScienceDirect International Journal of Electronics and Communications (AEÜ) jou rn al h omepage: www.elsevier.de/aeue Speech enhancement using sub-band cross-correlation compensated Wiener ﬁlter combined with harmonic regeneration Ch.V. Rama Rao a,∗ , M.B. Rama Murthy b , K. Srinivasa Rao c a Dept. of ECE, Gudlavalleru Engineering College, Gudlavalleru 521356, AP, India b Dept. of ECE, CMR College of Engineering & Technology, Kandlakoya, Hyderabad 501410, AP, India c TRR College of Engineering, Pathancheru 502319, AP, India a r t i c l e i n f o Article history: Received 29 November 2010 Received in revised form 12 October 2011 Accepted 13 October 2011 Keywords: Speech enhancement Wiener ﬁlter Critical band and speech harmonics a b s t r a c t An improved Wiener ﬁltering method for reducing background noise added to speech in colored noise environments like car engine is proposed in this paper. This implementation uses the cross-correlartion between the speech and noise signals. The noise signal does not affect the speech signal uniformly over the whole spectrum. A nonlinear sub-band Bark scale frequency spacing approach is used to reduce colored noise. However, classic short-time noise reduction techniques, including Wiener ﬁlter, introduce harmonic distortion in enhanced speech because of the unreliability of estimators for small signal-to- noise ratios. To overcome this problem, we propose a method to regenerate the suppressed harmonics. Nonlinearity is used to regenerate the degraded harmonics of the distorted signal in an efﬁcient way. Objective and subjective tests demonstrated the proposed technique can improve the perceptual quality of speeches. © 2011 Elsevier GmbH. All rights reserved. 1. Introduction In many speech communication systems, recognition of speech signal from a corrupted speech signal with background noise is a challenging task especially at low SNR (signal to noise ratio) values. Speech quality and intelligibility might signiﬁcantly deteriorate in the presence of background noise, especially when the speech signal is subject to subsequent processing, such as automatic speech recognition and speech coding. Due to use of automatic speech processing systems in a variety of real world applications, speech enhancement is increasingly becoming an important topic of research. To improve the performance of speech enhancement systems several methods are available in the literature [1–4]. The enhancement of noise corrupted speech signal can be done using the Wiener ﬁltering [5,6], spectral subtraction rules [7] and kalman ﬁltering. Among them power spectral subtraction and the Wiener ﬁltering algorithms are widely used because of their low computa- tional complexity and impressive performance. In general, in these algorithms the enhanced speech spectrum is obtained by subtracting an estimated noise spectrum from noisy speech spectrum or by multiplying the noisy spectrum with a gain function. Let the noisy speech, clean speech and noise signals are ∗ Corresponding author. E-mail addresses: chvramaraogec@gmail.com (Ch.V. Rama Rao), mbrmurthy@gmail.com (M.B. Rama Murthy), principaltrr@gmail.com (K. Srinivasa Rao). denoted by y(n), x(n) and d(n) respectively in time domain. If it is assumed that noise is additive, then y(n) can be expressed as: y(n) = x(n) + d(n) (1) The speech enhancement algorithm is developed in the fre- quency domain. The transformation is performed using the Short-Time Fourier Transform (STFT) due to the non-stationary nature of speech. The main reasons for working in the frequency domain is the importance of the short-time spectrum in the per- ception of speech [5] and the low computational cost of the transformation when the FFT algorithm is used. The task of the frequency domain speech enhancement algorithms is to produce an optimal, in some, sense, estimate of the clean speech STFT, once the STFT of the noisy speech is observed. Applying the Fast Fourier transform (FFT) to (1), at the mth frame and kth frequency bin, y(n) can be represented as: Y (m, k) = X(m, k) + D(m, k) (2) where Y(m, k), X(m, k) and D(m, k) are the noisy speech, clean speech and noise signals FFT coefﬁcients. An estimate of the clean speech component denoted as ˆ X(m, k) can be obtained by multiplying with ﬁlter gain function W(m, k) as given in Eq. (3) ˆ X(m, k) = W(m, k) Y (m, k) (3) In 1984, Ephraim and Malah derived estimator by minimiz- ing the mean square error between the enhanced speech and the clean speech. One of the earliest algorithms that falls in the above 1434-8411/$ – see front matter © 2011 Elsevier GmbH. All rights reserved. doi:10.1016/j.aeue.2011.10.007