AN IMROVEMENT IN USING HERMITIAN ANGLEIN CONVOLUTIVE SPEECH BLIND SOURCE SEPARATION Hamid Mahmoodian 1 , Atefeh Soltani 2 , Ali Hashemi 3 1 Electrical Faculty, Najafabad Branch, Islamic Azad University, Daneshgah Blvd., Najafabad, Iran h_mahmoodian@pel.iaun.ac.ir 2 Electrical Faculty, Najafabad Branch, Islamic Azad University, Daneshgah Blvd., Najafabad, Iran atefe_soltani2004@yahoo.com 3 Electrical Faculty, majlesi Branch, Islamic Azad University, Majlesi Town, Iran a.hashemi@iaumajlesi.ac.ir ABSTRACT This paper presents a T-F masking method for convolutive blind source separation based on hermitian angle concept. The hermitian angle is calculated between T-F domain mixture vector and reference vector. Two different reference vectors are assumed for calculating two different hermitian angles, and then these angles are clustered with k-means or FCM method to estimate unmixing masks. The well- known permutation problem is solved based on k-means clustering of estimated masks which are partitioned to small groups. The experiment results show that separation performance for two different reference vectors is better than that for only one reference vector. Index Terms— blind source separation (BSS), sparsity , w-disjoint orthogonality,hermitian angle. 1. INTRODUCTION The blind source separation problem is extracting original signals from their mixtures, assuming there is not any information about mixing process or original signals. The mixing model is instantaneous or convolutive. The problem can be explained as follow: Suppose that the source signals and microphone outputs are called as s ଵ ,s ଶ ,…,s ୯ and x ଵ ,x ଶ ,…,x ୮ convolutive BSS can be expressed as x ୮ (n) = ∑ ∑ h ୮୯ (l)S ୯ (n − l) Lଵ ୪ୀ଴ Q ୯ୀଵ (1) where, P is the number of microphones, Q is the source number, p = 1, . . , P and q = 1, . . , Q . L is the mixing filter length where the output signal vectors and p th microphone output samples are shown as x = [x ଵ ,x ଶ ,…,x ୮ ] T ,x ୮ = [x ୮ (0), … , x ୮ (N − 1)] T In the previous relation ‘T’ is transpose operator, N is the number of total samples column vector of sources and q th source samples are defined as s = [s ଵ ,s ଶ ,…,s ୮ ] T , s ୯ୀ [s ୯ (0), … , s ୯ (N − 1)] T The impulse response from q th source to p th microphone is h ୮୯ (l), l = 0, … L .Overdetermined and underdetermined BSS are related to the number of sources and sensors (microphones) comparing together. The separation criteria can be divided in to the methods based on higher order statistics (HOS) and second order statistics (SOS)[1]-[5]. In the underdetermined BSS (P< Q), SCA is the most popular method. There are several methods [6]-[9] that work based on the sparseness of the source signals. If the signals are sufficiently sparse, it could be assumed that the sources rarely exist simultaneously. 2. PROPOSED METHOD 2.1. Signal Transformation In the first stage of proposed method in Fig. 1, time-domain mixture signals which are sampled at frequency  ௦ are transformed in to time-frequency domain using STFT analysis. Time-Frequency transformation of (1) is: X(k, t) = H(k)S(k, t) ∑ H ୯ ( Q ୯ୀଵ k)S ୯ (k, t) (2) where X(k, t) is T-F transformation of microphone output vectors and S(k,t) is the STFT of source signals .i.e.: X(k, t) = X ଵ (k, t), … , X ୮ (k, t)൧ T S(k, t) = S ଵ (k, t), … , S ୮ (k, t)൧ T . The impulse response H(k) and q th source column vector of impulse response in the k th frequency bin are: 978-1-4799-3343-3/13/$31.00 ©2013 IEEE ICECCO 2013 368