AN IMROVEMENT IN USING HERMITIAN ANGLEIN CONVOLUTIVE SPEECH BLIND
SOURCE SEPARATION
Hamid Mahmoodian
1
, Atefeh Soltani
2
, Ali Hashemi
3
1
Electrical Faculty, Najafabad Branch, Islamic Azad University, Daneshgah Blvd., Najafabad, Iran
h_mahmoodian@pel.iaun.ac.ir
2
Electrical Faculty, Najafabad Branch, Islamic Azad University, Daneshgah Blvd., Najafabad, Iran
atefe_soltani2004@yahoo.com
3
Electrical Faculty, majlesi Branch, Islamic Azad University, Majlesi Town, Iran
a.hashemi@iaumajlesi.ac.ir
ABSTRACT
This paper presents a T-F masking method for convolutive
blind source separation based on hermitian angle concept.
The hermitian angle is calculated between T-F domain
mixture vector and reference vector. Two different reference
vectors are assumed for calculating two different hermitian
angles, and then these angles are clustered with k-means or
FCM method to estimate unmixing masks. The well- known
permutation problem is solved based on k-means clustering
of estimated masks which are partitioned to small groups.
The experiment results show that separation performance
for two different reference vectors is better than that for
only one reference vector.
Index Terms— blind source separation (BSS), sparsity ,
w-disjoint orthogonality,hermitian angle.
1. INTRODUCTION
The blind source separation problem is extracting original
signals from their mixtures, assuming there is not any
information about mixing process or original signals. The
mixing model is instantaneous or convolutive. The problem
can be explained as follow:
Suppose that the source signals and microphone outputs
are called as s
ଵ
,s
ଶ
,…,s
୯
and x
ଵ
,x
ଶ
,…,x
୮
convolutive BSS
can be expressed as
x
୮
(n) = ∑ ∑ h
୮୯
(l)S
୯
(n − l)
Lଵ
୪ୀ
Q
୯ୀଵ
(1)
where, P is the number of microphones, Q is the source
number, p = 1, . . , P and q = 1, . . , Q . L is the mixing filter
length where the output signal vectors and p
th
microphone
output samples are shown as
x = [x
ଵ
,x
ଶ
,…,x
୮
]
T
,x
୮
= [x
୮
(0), … , x
୮
(N − 1)]
T
In the previous relation ‘T’ is transpose operator, N is
the number of total samples column vector of sources and q
th
source samples are defined as
s = [s
ଵ
,s
ଶ
,…,s
୮
]
T
, s
୯ୀ
[s
୯
(0), … , s
୯
(N − 1)]
T
The impulse response from q
th
source to p
th
microphone
is h
୮୯
(l), l = 0, … L .Overdetermined and underdetermined
BSS are related to the number of sources and sensors
(microphones) comparing together. The separation criteria
can be divided in to the methods based on higher order
statistics (HOS) and second order statistics (SOS)[1]-[5]. In
the underdetermined BSS (P< Q), SCA is the most popular
method. There are several methods [6]-[9] that work based
on the sparseness of the source signals. If the signals are
sufficiently sparse, it could be assumed that the sources
rarely exist simultaneously.
2. PROPOSED METHOD
2.1. Signal Transformation
In the first stage of proposed method in Fig. 1, time-domain
mixture signals which are sampled at frequency
௦
are
transformed in to time-frequency domain using STFT
analysis. Time-Frequency transformation of (1) is:
X(k, t) = H(k)S(k, t) ∑ H
୯
(
Q
୯ୀଵ
k)S
୯
(k, t) (2)
where X(k, t) is T-F transformation of microphone output
vectors and S(k,t) is the STFT of source signals .i.e.:
X(k, t) = X
ଵ
(k, t), … , X
୮
(k, t)൧
T
S(k, t) = S
ଵ
(k, t), … , S
୮
(k, t)൧
T
.
The impulse response H(k) and q
th
source column
vector of impulse response in the k
th
frequency bin are:
978-1-4799-3343-3/13/$31.00 ©2013 IEEE ICECCO 2013 368