GENERALIZED RELATIVE HARMONIC COEFFICIENTS
Yonggang Hu
1
, Sharon Gannot
2
, Thushara D. Abhayapala
1
1
Audio and Acoustic Signal Processing Group, Australian National University, Canberra, Australia
2
Faculty of Engineering, Bar-Ilan University, Ramat-Gan, Israel
ABSTRACT
In literature, sound source localization under the far- and near-field
scenarios are mostly addressed as independent tasks using different
approaches. This causes a tedious task to detect the type of sound-
field, whereas in practice there may not be a clear boundary between
the far- and near-field soundfield. In contrast, this paper proposes a
multi-channel feature denoted generalized relative harmonic coeffi-
cients (generalized RHC) in the spherical harmonics domain, which
can equally localize both far- and near-field sound source without
requiring any adjustments. We derive the analytical expression of
this feature and summarize its unique properties, which facilitate
two single-source directional-of-arrival estimators: (i) using a full
grid search over the directional space; and (ii) a closed-form solu-
tion without any grid search. Experimental study in realistic noisy
and reverberant environments under both near-field and far-field con-
ditions validates the efficacy of the proposed algorithm.
Index Terms— Sound source localization, generalized relative
harmonic coefficients, far- and near-field scenarios.
1. INTRODUCTION
Directional-of-arrival (DOA) estimation of acoustic sources has been
extensively explored in both the academia and industry, as it is a
crucial component in many spatial acoustic signal processing tech-
niques and applications, including but not limited to sound source
tracking, speech enhancement, dereverbeation, separation and vir-
tual reality [1]. In recent decade, higher-order microphone arrays,
such as spherical microphone arrays, became more widely used in
the source localization task, due to their ability to capture the spa-
tial cues of the sources. Particularly, it applies spatial decompo-
sition of the multi-channel measurements using a set of spherical
harmonics (SH) functions [2]. Specifically, the SH decomposition
brings several advantages to source localization, such as the decou-
pled frequency-dependent and angular-dependent components, and
enhanced directivity pattern of the steering vectors over the two-
dimensional (2-D) directional space [3].
Multiple SH-domain source localization approaches are avail-
able in the literature. They can be roughly grouped into the follow-
ing types: (i) subspace methods such as multiple signal classification
(MUSIC) [4, 5] and estimation of signal parameters via rotational
invariance techniques (ESPRIT) [6–8]; (ii) beamformer based ap-
proaches searching the directional space with maximum power such
as maximum steered response power [9]; (iii) pseudointensity based
approaches [10, 11] using the first-order spherical harmonic coeffi-
cients to approximate the acoustic intensity [12, 13]; and (iv) ap-
proaches using the relative harmonic coefficients which only depend
on the source position even in a static reverberant acoustic environ-
ment [3], and are capable of localizing the sources using various
optimization strategies [14–17].
The above localization approaches mainly focus on far-field
scenarios, assuming a plane wave propagation as the source-array
distance is assumed much larger than the aperture of the microphone
setup, whereas the near-field regime becomes dominant when the
source is very close to the microphone(s). The latter case is also
very common in practice as encountered in the close-talking sce-
narios [18]. In near-field soundfields, the common strategy is to
extend the localization framework, applied in the far-field scenarios,
with proper adjustments, specifically adding the source range to the
search domain [19, 20]. Although the far- and near-field source lo-
calization share similar principles, existing algorithms mostly treat
them as independent tasks, hence necessitating proper detectors.
However, there may not be a clear distinction between the far- and
near-field scenarios, as in practice sources may have both far- and
near-field characteristics [21]. Additionally, the criterion to distin-
guish between near- and far-field soundfields, e.g. in [22], depends
on the frequency content of the signal, and may therefore require
special attention when applied to broadband signals.
By contrast, this paper targets at a multi-channel solution that
can be applied to both far- and near-field scenarios, circumventing
the need to differentiate between soundfield. To achieve this goal,
we propose a new spatial feature, which is a generalization of the rel-
ative harmonic coefficients we have introduced in [3, 23, 24], hence
denoted generalized RHC. In this paper, we first derive the analytical
expression of this feature. Then, we describe its unique properties,
leading to two new single-source DOA estimators: the first applies
a grid search over the directional space, and the second applies a
closed-form solution. Finally, we carry out an extensive experimen-
tal study, under both near- and far-field scenarios, to validate the
applicability of the proposed algorithms in realistic noisy and rever-
berant environments.
2. SYSTEM MODEL
2.1. Space domain Model
Assume a multi-channel microphone array, such as a M-channel
spherical microphone array with polar coordinates xj =(θj ,φj ,r),
j =1,...,M, where θj ,φj ,r denote the elevation, azimuth and
microphone radius with respect to the local origin O. Assume a
sound source located at an unknown position Ψs =(ϑs,ϕs,rs),
with elevation θs, azimuth φs and source-array distance rs. Hence,
the sound pressure, as measured by the j-th microphone, is given by
P (xj ,k)= S(k)G(xj ,k)+ V (xj ,k) (1)
where k =2πf/c is the wavenumber, f is the frequency bin, c is
the speed of sound, P (xj ,k) denotes the received pressure, S(k) de-
notes the source signal, G(xj ,k) denotes the acoustic transfer func-
tion from the source to the j -th microphone and V (xj ,k) denotes
the additive and non-directional noise.
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-6327-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASSP49357.2023.10095371