Concurrent speech segregation using a microphone array for computer users Yoshifumi CHISAKI, Tsuyoshi EIZA, Hidetoshi NAKASHIMA, Tsuyoshi USAGAWA Department of Computer Science, Faculty of Engineering Kumamoto University, JAPAN {chisaki,tuie}@cs.kumamoto-u.ac.jp Abstract In the computer aided communication systems, such as an e-learning system, a headset for speech communica- tion is still required to obtain a desired power level of the target speech. In order to achieve natural and com- fortable communication without a headset, a microphone array system designed for computer users has been pro- posed by Usagawa et al. The advantage of the system is that all sound sources are segregated simultaneously. In the system, some microphone elements are attached to each edge of a computer display. The system consists of three blocks; DOA (direction of arrival) estimation, pre- separation based on a blind signal separation, and an ANF (adaptive notch filter) is based on the iterative echo sup- pression method. Although each block affects to the to- tal performance, in particular, the ANF block affects to quality of a segregated signal. A directivity of the adap- tive notch filter is sufficient to segregate a speech signal. However, quality of the segregated signal is degraded be- cause there are some dips in frequency response. This paper addresses that a frequency of the dip depends on both microphone element spacing, DOA, and a sampling frequency. Moreover, a new method for selection of a fil- ter which does not have critical dip points in a desired fre- quency range is proposed. Simulation is performed under concurrent speech condition. Speech signals are used as the target and the interference sound source, respectively. The overall power level of the target speech is set to 5 dB against that of the interference one. As a result, the value of coherence for the segregated target speech is improved from 0.91 by the system in the previous study to 0.97 by the proposed system. 1. Introduction There are many studies for signal separation using a mi- crophone array. One of signal separation methods us- ing a microphone array is proposed by Usagawa et. al [1][2]. Furthermore, the microphone array algorithm is adopted as a front-end of pitch detection system under concurrent speech condition[3]. The system consists of three blocks; DOA (direction of arrival) estimation, pre- separation based on a blind signal separation, and an adaptive notch filter is based on the iterative echo sup- pression method. The system can segregate multiple sound sources simultaneously. However, frequency re- sponse of the adaptive notch filter is degraded because there are some dips in frequency response. In this paper, a new algorithm of microphone array for computer users is proposed in order to avoid degradation of speech qual- ity due to some dips in frequency response. 2. Microphone array system 2.1. Overview of array processing Let us assume microphone elements are attached to four edges of computer display as shown in Fig.1. Each sub- array at a frame edge consists of three elements. Element spaces are d 1 and d 2 . A sound source S q (q=1, ··· ,N ) is positioned at (θ q ,φ q ). Figure 2 shows a block diagram of the proposed method in case of two sound sources. The signal separation is performed frame by frame by the following steps. In the first step, a pre-separation using a blind de- convolution is performed for DOA (direction of arrival) estimation. Figure 3 shows a block diagram of step 1. A blind deconvolution based on AMUSE ( algorithm for multiple unknown signals extraction ) method [4] is per- formed at each sub-array. Moreover, a direction of arrival for the first sound source is estimated by DSA ( delay- and-sum array ). In addition, permutation process for sound sources is performed. In the second step, a direction of a sound source which has the second highest power is estimated by DSA. Each sound source signal is separated by an adaptive notch filter designed for each estimated DOA frame by frame. Finally, output signal is obtained using hanning window. In a process on an adaptive notch filtering, the filter is designed for each DOA. The filter has some dips in frequency response. The frequency of a dip depends on a sampling frequency, a direction of arrival and a com- bination of microphone elements, that is, a lag time be- tween arbitrary two microphone elements. An automatic selection of microphone elements is proposed here in or- der to avoid dips in a desired frequency range. Th4.D.3 IV - 2751