Int. J. Signal and Imaging Systems Engineering, Vol. 9, Nos. 4/5, 2016 209 Copyright © 2016 Inderscience Enterprises Ltd. A microphone array beamforming-based system for multi-talker speech separation Adel Hidri* and Hamid Amiri Laboratoire de Recherche: Signal, Image et Technologie de l’Information (LR-SITI), École Nationale d’Ingénieurs de Tunis (ENIT), BP 37, le Belvédère 1002 Tunis, Tunisia Email: hidri_adel@yahoo.fr Email: hamidlamiri@gmail.com *Corresponding author Abstract: This paper presents a Multichannel Speech Separation System (MCSS) based on new beamforming frequency domain method. The beamformer exploits the spatial properties of the source signals using a microphone array. Therefore, it is based on a prior knowledge of the position of the speakers relative to the array. The proposed beamformer is defined with two processing steps: the first one is to keep a unit gain of the desired signal and the other blocks the wanted signal and minimises the output power of the interferences within only one step. In order to separate multiple speakers, multiple beamformers are used simultaneously, where a beamformer is computed for each source considering the remaining sources as interferers. We test and evaluate the proposed MCSS on real recording mixtures extracted from ‘Multichannel In-Car Speech Database’. The experimental results proved the effectiveness of the proposed system in terms of speech separation. The quality of speech will be improved compared to the state-of-the-art. Keywords: speech signal; beamforming; microphone arrays; multichannel speech separation; optimal filtering; spatial filter. Reference to this paper should be made as follows: Hidri, A. and Amiri, H. (2016) ‘A microphone array beamforming-based system for multi-talker speech separation’, Int. J. Signal and Imaging Systems Engineering, Vol. 9, Nos. 4/5, pp.209–217. Biographical notes: Adel Hidri received his PhD degree in Electrical Engineering in 2014 at National Engineering School of Tunis (ENIT), Tunisia. He received the Master degree in Electrical Engineering, option: Industrial Automation Compute, from the Higher School of Science and Technology of Tunis in 2001 and DEA in Electrical Engineering, Automation and Digital Signal Processing, from the National Engineering School of Tunis in 2004. He is Teacher at the Higher Institute of Multimedia and Arts of Manouba, Tunisia, from 2004. He is currently a member of LR-SITI. His current research interests include spatial filtering: beamforming, microphone array, source separation, source extraction and noise reduction. Hamid Amiri received the Diploma of Electrotechnics, Information Technique in 1978 and the PHD degree in 1983 at the TU Braunschweig, Germany. He obtained the Doctorate’s Sciences in 1993. He was a Professor at the National Engineering School of Tunis, Tunisia, from 1987 to 2001. From 2001 to 2009, he was at the Riyadh College of Telecom and Information. Currently, he is again at ENIT and he is a Head of LR-SITI (Research Laboratory: Signal, Image and Information Technology). His research is focused on image processing, speech processing, document processing and natural language processing. 1 Introduction In a multiple speaker environment, speech of a speaker of interest is contaminated by interference and noise, leading to low intelligibility for human hearing and for further advanced processing such as automatic speech recognition (Sharma and Atkins, 2014; Kacur and Chudy, 2014). However, the practical performance suffers from non- stationarity of human speech and reverberation of the recording environment. One big challenge in this area is to extract or separate concurrent speech. Addressing this issue, a number of approaches have been proposed (Makino et al., 2007; Naik and Wang, 2014). They can be classified into Blind Source Separation (BSS) method and beamforming method (Hidri et al., 2012b). Beamforming techniques have theoretically shown great potential for extracting the speech signal of interest (Cohen et al., 2010). The concept of ‘beamforming’ refers to multichannel signal processing