Contents lists available at ScienceDirect Microprocessors and Microsystems journal homepage: www.elsevier.com/locate/micpro Coherence based dual microphone speech enhancement technique using FPGA Tanmay Biswas ⁎ , Sudhindu Bikash Mandal, Debasri Saha, Amlan Chakrabarti A.K.Choudhury School of Information Technology, University of Calcutta, Sector-3, Salt lake City, Kolkata 700098, India ARTICLE INFO Keywords: Microphone array Time delay of arrival (TDOA) Coherence noise Coherence function Speech enhancement FPGA System generator ABSTRACT This paper, presents a design and implementation of dual microphone coherence based speech enhancement technique using ﬁeld programmable gate array (FPGA). In order to have a proper enhancement of dual mi- crophone system, we require to estimate the time delay of arrival (TDOA) between the two microphone signals which is followed by the application of the proposed speech enhancement algorithm. We have used TDOA algorithm based on phase transform to minimize the eﬀect of reverberation for localization of the sound sources. Coherence based technique has been used for speech enhancement process which requires no background noise estimation. In this way, we can achieve a high localization accuracy and also the capability of dealing with coherent noise. In the proposed system, TDOA and speech enhancement processes are executed concurrently exploiting the parallel logic blocks of FPGA, thus increasing the throughput of the system to a great extent. We have implemented our design on Spartan6 Lx45 FPGA device. The subjective evaluation of the proposed design with normal hearing listeners using comprehensibility listing test has been done and its performance has been compared to the existing state of the art research works. The objective evaluation of the proposed design also designates the signiﬁcant melioration over the existing state of the art research works. The subjective and ob- jective evaluation infer that our proposed hardware induce feasible solution for hearing aid and other hand-held devices. 1. Introduction Speech enhancement aims to improve the quality of speech in a noisy environment. In non-stationary noisy signal, single microphone speech enhancement algorithms are preferred. Several speech en- hancement algorithms have been proposed in the past few years. The spectral subtraction technique is a well known technique for speech noise elimination, which was originally introduced by Boll [3]. An upgraded version was introduced by Berouti et al. [4] for the musical noise reduction. The general principle behind the spectral subtraction is to estimate noise from the magnitude spectrum, which then gets sub- tracted from the original signal keeping the phase part of the spectrum unchanged. Recently, spectral subtraction for speech enhancement was introduced by Zhang et al. [5], where the subtraction procedure was performed on both real and imaginary parts of the spectrum. The multi band spectral subtraction method for speech enhancement was in- troduced by Kamath [6], where the spectrum was divided into several bands for eﬃcient noise reduction. A recent research work on speech enhancement technique based on spectral subtraction on FPGA can be found in [7]. In microphone array system, we need to adjust the time diﬀerence between the signals to localize the sound sources. Plenty of research work on TDOA estimation has been done in the past few years. The preliminary goal of a localization system is accuracy. To localize the sound source time delay estimation [8] has been widely used due to its simplicity and accuracy. Various algorithms have been designed to es- timate the time delay, with varying degrees of accuracy and computa- tional complexity. Conditional time frequency histograms for the lo- calization of sound source was proposed in [12]. Cross correlation method (CC) has been used to ﬁnd the degree of correlation between the signals in [9]. The maximum time lag ﬁnds the time delay esti- mation in the CC based methods. The improved version of the cross correlation algorithms are named as generalized cross correlation [10] methods. The main advantages of these algorithms are low computa- tional cost and high accuracy. By choosing the weighting function in a generalized cross correlation method, we can converge to the maximum likelihood method [11]. By investigating various time delay estimation techniques, we have chosen the phase transform algorithm to localize the sound source as it suits the need of real time applications. In this algorithm (PHAT), we normalize the cross power spectrum of the http://dx.doi.org/10.1016/j.micpro.2017.10.007 Received 28 April 2017; Received in revised form 18 August 2017; Accepted 21 October 2017 ⁎ Corresponding author. E-mail addresses: tbakc_rs@caluniv.ac.in (T. Biswas), acakcs@caluniv.ac.in (A. Chakrabarti). Microprocessors and Microsystems 55 (2017) 111–118 Available online 23 October 2017 0141-9331/ © 2017 Elsevier B.V. All rights reserved. MARK