I.J. Image, Graphics and Signal Processing, 2015, 6, 29-37
Published Online May 2015 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijigsp.2015.06.04
Copyright © 2015 MECS I.J. Image, Graphics and Signal Processing, 2015, 6, 29-37
Dominant Frequency Enhancement of Speech
Signal to Improve Intelligibility and Quality
Premananda B.S.
Department of Telecommunication, R.V. College of Engineering, Bengaluru, India
Email: premanandabs@rvce.edu.in
Uma B.V.
Department of Electronics & Communication, R.V. College of Engineering, Bengaluru, India
Email: umabv@rvce.edu.in
Abstract—In mobile devices, perceived speech signal
deteriorates significantly in the presence of near-end
noise as the signal arrives directly at the listener's ears in
a noisy environment. There is an inherent need to
increase the clarity and quality of the received speech
signal in noisier environment. It is accomplished by
incorporating speech enhancement algorithms at the
receiver end. The objective is to improve the
intelligibility and quality of the speech signal by
dynamically enhancing the speech signal when the near-
end noise dominates. This paper proposes a speech
enhancement approaches by inculcating the threshold of
hearing and auditory masking properties of the human ear.
Incorporating the masking properties, the speech samples
that are audible can be obtained. In low SNR
environments, selective audible samples can be enhanced
to improve the clarity of the signal rather than enhancing
every loud sample. Intelligibility and quality of the
enhanced speech signal are measured using Speech
Intelligibility Index and Perceptual Evaluation of Speech
Quality. Experimental results connote the intelligibility
and quality improvement of the speech signal with the
proposed method over the unprocessed far-end speech
signal. This approach is efficient in overcoming the
deterioration of speech signals in a noisy environment.
Index Terms—Dominant, Near-end noise,
Psychoacoustics, Speech enhancement, Speech
intelligibility, Speech quality
I. INTRODUCTION
Mobile devices are the most popular consumer devices
in the present day. For a conversation in a quiet
environment, less speech magnitude is required for the
speakers to understand each other. However, for instance,
if a train passes by, the conversation is severely disturbed.
To overcome this effect, we should either wait until the
train passes or raise the signal amplitude to produce more
speech energy in order to increase the loudness. The
external volume control of the mobile phones cannot be
used as background noise changes in a dynamic fashion.
As the noise signal cannot be mended upon, a
reasonable approach is to manipulate the far-end speech
signal based on the energy of near-end noise. Hence, the
problem necessitates the need for the development of
speech enhancement algorithms to improve the speech
perception in adverse listening conditions. The nature of
the speech enhancement differs depending on specific
applications.
At the receiving end, referred to as ―near-end‖ in the
literature, the listener may be in a noisy environment. It
makes hearing difficult, even though, the transmitting
speech source is in a reticent environment because the
near-end noise hits the listener's ear directly. Listener
experiences fatigue as the quality of the speech signal
deteriorates.
The presence of noise masks the speech signal and
makes it less intelligent or audible. This effect is called
masking and is of two types, one, simultaneous masking
and the other temporal masking. In simultaneous masking,
a signal is masked by the presence of another signal
(predominantly noise). In temporal masking, the signal is
masked by noise before and after the high noise occurs.
Hence, the speech signal needs to be enhanced
considering these situations in the purview of the problem.
The basic idea, of including masking effects in speech
signal enhancement, is to remove the non-audible spectral
components of the speech signal and the masked signal.
Hence, speech enhancement not only involves increasing
speech signal for human listening but also for further
improvement prior to listening. The objective of signal
enhancement is to increase the perceptual aspects of
speech such as overall quality, intelligibility, etc. The
speech enhancement algorithms should provide superior
performance in a broad range of SNRs for both clarity
and quality.
The effect of far-end noise on speech signal can be
tackled by using traditional noise suppression algorithms
like minimum mean-square error (MMSE), short-time
spectral amplitude (STSA) estimator [18], spectral
subtraction methods [20], etc. The approaches proposed
for far-end noise reduction techniques discussed in the
literature [18-20] are not suitable in the present context as
they focus on mitigating noise at the speaker end rather
than at the receiver end. Near-end noise cannot be
influenced because the listener is located in a noisy
environment, and the noise reaches the ears with hardly