Contents lists available at ScienceDirect
Microprocessors and Microsystems
journal homepage: www.elsevier.com/locate/micpro
Coherence based dual microphone speech enhancement technique using
FPGA
Tanmay Biswas
⁎
, Sudhindu Bikash Mandal, Debasri Saha, Amlan Chakrabarti
A.K.Choudhury School of Information Technology, University of Calcutta, Sector-3, Salt lake City, Kolkata 700098, India
ARTICLE INFO
Keywords:
Microphone array
Time delay of arrival (TDOA)
Coherence noise
Coherence function
Speech enhancement
FPGA
System generator
ABSTRACT
This paper, presents a design and implementation of dual microphone coherence based speech enhancement
technique using field programmable gate array (FPGA). In order to have a proper enhancement of dual mi-
crophone system, we require to estimate the time delay of arrival (TDOA) between the two microphone signals
which is followed by the application of the proposed speech enhancement algorithm. We have used TDOA
algorithm based on phase transform to minimize the effect of reverberation for localization of the sound sources.
Coherence based technique has been used for speech enhancement process which requires no background noise
estimation. In this way, we can achieve a high localization accuracy and also the capability of dealing with
coherent noise. In the proposed system, TDOA and speech enhancement processes are executed concurrently
exploiting the parallel logic blocks of FPGA, thus increasing the throughput of the system to a great extent. We
have implemented our design on Spartan6 Lx45 FPGA device. The subjective evaluation of the proposed design
with normal hearing listeners using comprehensibility listing test has been done and its performance has been
compared to the existing state of the art research works. The objective evaluation of the proposed design also
designates the significant melioration over the existing state of the art research works. The subjective and ob-
jective evaluation infer that our proposed hardware induce feasible solution for hearing aid and other hand-held
devices.
1. Introduction
Speech enhancement aims to improve the quality of speech in a
noisy environment. In non-stationary noisy signal, single microphone
speech enhancement algorithms are preferred. Several speech en-
hancement algorithms have been proposed in the past few years. The
spectral subtraction technique is a well known technique for speech
noise elimination, which was originally introduced by Boll [3]. An
upgraded version was introduced by Berouti et al. [4] for the musical
noise reduction. The general principle behind the spectral subtraction is
to estimate noise from the magnitude spectrum, which then gets sub-
tracted from the original signal keeping the phase part of the spectrum
unchanged. Recently, spectral subtraction for speech enhancement was
introduced by Zhang et al. [5], where the subtraction procedure was
performed on both real and imaginary parts of the spectrum. The multi
band spectral subtraction method for speech enhancement was in-
troduced by Kamath [6], where the spectrum was divided into several
bands for efficient noise reduction. A recent research work on speech
enhancement technique based on spectral subtraction on FPGA can be
found in [7].
In microphone array system, we need to adjust the time difference
between the signals to localize the sound sources. Plenty of research
work on TDOA estimation has been done in the past few years. The
preliminary goal of a localization system is accuracy. To localize the
sound source time delay estimation [8] has been widely used due to its
simplicity and accuracy. Various algorithms have been designed to es-
timate the time delay, with varying degrees of accuracy and computa-
tional complexity. Conditional time frequency histograms for the lo-
calization of sound source was proposed in [12]. Cross correlation
method (CC) has been used to find the degree of correlation between
the signals in [9]. The maximum time lag finds the time delay esti-
mation in the CC based methods. The improved version of the cross
correlation algorithms are named as generalized cross correlation [10]
methods. The main advantages of these algorithms are low computa-
tional cost and high accuracy. By choosing the weighting function in a
generalized cross correlation method, we can converge to the maximum
likelihood method [11]. By investigating various time delay estimation
techniques, we have chosen the phase transform algorithm to localize
the sound source as it suits the need of real time applications. In this
algorithm (PHAT), we normalize the cross power spectrum of the
http://dx.doi.org/10.1016/j.micpro.2017.10.007
Received 28 April 2017; Received in revised form 18 August 2017; Accepted 21 October 2017
⁎
Corresponding author.
E-mail addresses: tbakc_rs@caluniv.ac.in (T. Biswas), acakcs@caluniv.ac.in (A. Chakrabarti).
Microprocessors and Microsystems 55 (2017) 111–118
Available online 23 October 2017
0141-9331/ © 2017 Elsevier B.V. All rights reserved.
MARK