Copyright © 2018 Authors. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
International Journal of Engineering & Technology, 7 (4.11) (2018) 231-235
International Journal of Engineering & Technology
Website: www.sciencepubco.com/index.php/IJET
Research paper
Speech Enhancement based on 2D Gabor Filters for Arabic
Phoneme Spoken by Malay Speakers
Ali Abd Almisreb
2
, Nooritawati Md Tahir
1
*, Ahmad Farid Abidin
1
, Norashidah Md Din
2
1
Faculty of Electrical Engineering, Universiti Teknologi MARA (UiTM), 40450 Shah Alam, Selangor, Malaysia
2
Institute of Energy Infrastructure, Universiti Tenaga Nasional (UNITEN), 43000 Kajang, Selangor, Malaysia
*Corresponding author E-mail: nooritawati@ieee.org
Abstract
In this paper, a speech enhancement method using 2D Gabor filter is proposed. The proposed filter is used to enhance Arabic phoneme
speech signals that have been recorded under control environment namely indoor room recording. All the phoneme signals are spoken by
Malay speakers and considered as non-native Arabic speakers. Firstly, corrupted speech signals by noise must be enhanced before further
processing. The effectiveness of the suggested approach is evaluated in compare with Wiener filter. It is proven that the proposed 2D
Gabor filters performed appropriately for speech enhancement purpose at different wavelengths.
Keywords: 2-D Gabor filters; Arabic; Malay; Wiener filter.
1. Introduction
Due to the importance of speech enhancement in many applica-
tions such as communications, coding systems, hearing aids, air-
craft cockpits, automatic speech recognition systems and forensics
are still considered as one of relevant and important research areas
to be explored. As we know, speech enhancement aims to reduce
or eliminate the unwanted data in a speech waveform with the aim
of increasing the acceptability, clearance and intelligibility of the
speech signal but without degrading the original signal. Speech
signals noises can be classified into several types according to the
characteristics of the time and frequency domain, narrow band
noise, band limited white noise, colored noise, impulse noise and
transient noise pulses. On the other hand, speech enhancement can
be categorized into two classes: single-channel and multi-channel
approaches. Single-channel enhances the intelligibility and quality
of speech, whilst multi-channel shows the ability to improve the
quality and intelligibility of the speech using spectral and spatial
details of both speech and noise [1]. Thus, in order to solve speech
enhancement challenges, many algorithms have been suggested.
For instance, conventional method of speech enhancement using
spectral subtraction was proposed by [2], followed by enhance-
ment method based on Wiener filter [3, 4] and minimum-mean
square error approach [5]. In addition, there are also speech en-
hancements algorithms that was suggested based on sub-band
method. The main usage of sub-band adaptive filters is to identify
the response of very long impulse, but these filters have low con-
vergence. Also, it can be used to identify the linear systems de-
pending on its impulse responses [6]. Furthermore, the main prin-
ciple of speech enhancement approaches via Discrete Wavelet
Transform coefficients (DTW) thresholding the noisy speech is
the first estimation to determine the difference between DWT
coefficients of noise as compared to pure speech. Then, DWT
coefficients thresholding is implemented to reduce the noise in the
speech wavefrom. Recently, many researchers have focused on
using wavelet packet for speech enhancement as proposed by [7]
using integration between perceptual filterbank and minimum
mean square error short time spectral amplitude estimation. The
filterbank was built according to undecimated wavelet packet
decomposition tree. Another speech enhancement algorithm was
also proposed by [8] that consist of two portions relayed on wave-
let package. The first stage is accomplished using wavelet trans-
form followed by the second stage for removal of wavelet coeffi-
cients of the noisy speech. Additionally, new enhancement method
was also suggested by [9] using time-scale adaptation of wavelet
thresholds specifically wavelet coefficients energy is used to rep-
resent time dependency along with scale dependency is represent-
ed by spreading the level dependent threshold principle into wave-
let packet thresholding. On the other hand, in [10] suggested
speech enhancement system that based on wavelet thresholding
techniques to overcome basic wavelet thresholding algorithm
limitations namely white Gaussian noise and bad auditory quality.
As reported in [10], in order to solve the drawback of basic wave-
let thresholding, a system of speech enhancement based on adap-
tive thresholding of the wavelet packets was proposed without
voiced/unvoiced detection system as a different speech activity
detector is setup as an alternative to update noise statistics for
colored or non-stationary noise Furthermore, adaptive wavelet
thresholding was developed for waveform enhancement as dis-
cussed in [11]. The Bionic Wavelet Transform was initially in-
tended for voice coding, but later used for speech enhancement by
deriving an adaptive wavelet transform from non-linear auditory
model of the cochlea. Moreover, wavelet packet transform was
also applied to remove additive white Gaussian noise from cor-
rupted speech signal and sufficient results reported using soft
thresholding function. On the other hand, as explained in [12], a
critical-band decomposition was developed for waveform en-
hancement method. The method anticipated converting noisy
background into wavelet coefficients, followed by enhancement of
the coefficients by subtracting threshold from noisy coefficients.
Thresholding was accomplished using segmental SNR and noise
masking threshold. Another research on speech enhancement was