Int. J. Electron. Commun. (AEÜ) 66 (2012) 459–464
Contents lists available at SciVerse ScienceDirect
International Journal of Electronics and
Communications (AEÜ)
jou rn al h omepage: www.elsevier.de/aeue
Speech enhancement using sub-band cross-correlation compensated Wiener
filter combined with harmonic regeneration
Ch.V. Rama Rao
a,∗
, M.B. Rama Murthy
b
, K. Srinivasa Rao
c
a
Dept. of ECE, Gudlavalleru Engineering College, Gudlavalleru 521356, AP, India
b
Dept. of ECE, CMR College of Engineering & Technology, Kandlakoya, Hyderabad 501410, AP, India
c
TRR College of Engineering, Pathancheru 502319, AP, India
a r t i c l e i n f o
Article history:
Received 29 November 2010
Received in revised form 12 October 2011
Accepted 13 October 2011
Keywords:
Speech enhancement
Wiener filter
Critical band and speech harmonics
a b s t r a c t
An improved Wiener filtering method for reducing background noise added to speech in colored noise
environments like car engine is proposed in this paper. This implementation uses the cross-correlartion
between the speech and noise signals. The noise signal does not affect the speech signal uniformly over
the whole spectrum. A nonlinear sub-band Bark scale frequency spacing approach is used to reduce
colored noise. However, classic short-time noise reduction techniques, including Wiener filter, introduce
harmonic distortion in enhanced speech because of the unreliability of estimators for small signal-to-
noise ratios. To overcome this problem, we propose a method to regenerate the suppressed harmonics.
Nonlinearity is used to regenerate the degraded harmonics of the distorted signal in an efficient way.
Objective and subjective tests demonstrated the proposed technique can improve the perceptual quality
of speeches.
© 2011 Elsevier GmbH. All rights reserved.
1. Introduction
In many speech communication systems, recognition of speech
signal from a corrupted speech signal with background noise is a
challenging task especially at low SNR (signal to noise ratio) values.
Speech quality and intelligibility might significantly deteriorate
in the presence of background noise, especially when the speech
signal is subject to subsequent processing, such as automatic
speech recognition and speech coding. Due to use of automatic
speech processing systems in a variety of real world applications,
speech enhancement is increasingly becoming an important topic
of research. To improve the performance of speech enhancement
systems several methods are available in the literature [1–4]. The
enhancement of noise corrupted speech signal can be done using
the Wiener filtering [5,6], spectral subtraction rules [7] and kalman
filtering. Among them power spectral subtraction and the Wiener
filtering algorithms are widely used because of their low computa-
tional complexity and impressive performance.
In general, in these algorithms the enhanced speech spectrum
is obtained by subtracting an estimated noise spectrum from noisy
speech spectrum or by multiplying the noisy spectrum with a gain
function. Let the noisy speech, clean speech and noise signals are
∗
Corresponding author.
E-mail addresses: chvramaraogec@gmail.com (Ch.V. Rama Rao),
mbrmurthy@gmail.com (M.B. Rama Murthy), principaltrr@gmail.com
(K. Srinivasa Rao).
denoted by y(n), x(n) and d(n) respectively in time domain. If it is
assumed that noise is additive, then y(n) can be expressed as:
y(n) = x(n) + d(n) (1)
The speech enhancement algorithm is developed in the fre-
quency domain. The transformation is performed using the
Short-Time Fourier Transform (STFT) due to the non-stationary
nature of speech. The main reasons for working in the frequency
domain is the importance of the short-time spectrum in the per-
ception of speech [5] and the low computational cost of the
transformation when the FFT algorithm is used. The task of the
frequency domain speech enhancement algorithms is to produce
an optimal, in some, sense, estimate of the clean speech STFT, once
the STFT of the noisy speech is observed. Applying the Fast Fourier
transform (FFT) to (1), at the mth frame and kth frequency bin, y(n)
can be represented as:
Y (m, k) = X(m, k) + D(m, k) (2)
where Y(m, k), X(m, k) and D(m, k) are the noisy speech, clean speech
and noise signals FFT coefficients. An estimate of the clean speech
component denoted as
ˆ
X(m, k) can be obtained by multiplying with
filter gain function W(m, k) as given in Eq. (3)
ˆ
X(m, k) = W(m, k) Y (m, k) (3)
In 1984, Ephraim and Malah derived estimator by minimiz-
ing the mean square error between the enhanced speech and the
clean speech. One of the earliest algorithms that falls in the above
1434-8411/$ – see front matter © 2011 Elsevier GmbH. All rights reserved.
doi:10.1016/j.aeue.2011.10.007