GENERALIZED RELATIVE HARMONIC COEFFICIENTS Yonggang Hu 1 , Sharon Gannot 2 , Thushara D. Abhayapala 1 1 Audio and Acoustic Signal Processing Group, Australian National University, Canberra, Australia 2 Faculty of Engineering, Bar-Ilan University, Ramat-Gan, Israel ABSTRACT In literature, sound source localization under the far- and near-ﬁeld scenarios are mostly addressed as independent tasks using different approaches. This causes a tedious task to detect the type of sound- ﬁeld, whereas in practice there may not be a clear boundary between the far- and near-ﬁeld soundﬁeld. In contrast, this paper proposes a multi-channel feature denoted generalized relative harmonic coefﬁ- cients (generalized RHC) in the spherical harmonics domain, which can equally localize both far- and near-ﬁeld sound source without requiring any adjustments. We derive the analytical expression of this feature and summarize its unique properties, which facilitate two single-source directional-of-arrival estimators: (i) using a full grid search over the directional space; and (ii) a closed-form solu- tion without any grid search. Experimental study in realistic noisy and reverberant environments under both near-ﬁeld and far-ﬁeld con- ditions validates the efﬁcacy of the proposed algorithm. Index Terms— Sound source localization, generalized relative harmonic coefﬁcients, far- and near-ﬁeld scenarios. 1. INTRODUCTION Directional-of-arrival (DOA) estimation of acoustic sources has been extensively explored in both the academia and industry, as it is a crucial component in many spatial acoustic signal processing tech- niques and applications, including but not limited to sound source tracking, speech enhancement, dereverbeation, separation and vir- tual reality [1]. In recent decade, higher-order microphone arrays, such as spherical microphone arrays, became more widely used in the source localization task, due to their ability to capture the spa- tial cues of the sources. Particularly, it applies spatial decompo- sition of the multi-channel measurements using a set of spherical harmonics (SH) functions [2]. Speciﬁcally, the SH decomposition brings several advantages to source localization, such as the decou- pled frequency-dependent and angular-dependent components, and enhanced directivity pattern of the steering vectors over the two- dimensional (2-D) directional space [3]. Multiple SH-domain source localization approaches are avail- able in the literature. They can be roughly grouped into the follow- ing types: (i) subspace methods such as multiple signal classiﬁcation (MUSIC) [4, 5] and estimation of signal parameters via rotational invariance techniques (ESPRIT) [6–8]; (ii) beamformer based ap- proaches searching the directional space with maximum power such as maximum steered response power [9]; (iii) pseudointensity based approaches [10, 11] using the ﬁrst-order spherical harmonic coefﬁ- cients to approximate the acoustic intensity [12, 13]; and (iv) ap- proaches using the relative harmonic coefﬁcients which only depend on the source position even in a static reverberant acoustic environ- ment [3], and are capable of localizing the sources using various optimization strategies [14–17]. The above localization approaches mainly focus on far-ﬁeld scenarios, assuming a plane wave propagation as the source-array distance is assumed much larger than the aperture of the microphone setup, whereas the near-ﬁeld regime becomes dominant when the source is very close to the microphone(s). The latter case is also very common in practice as encountered in the close-talking sce- narios [18]. In near-ﬁeld soundﬁelds, the common strategy is to extend the localization framework, applied in the far-ﬁeld scenarios, with proper adjustments, speciﬁcally adding the source range to the search domain [19, 20]. Although the far- and near-ﬁeld source lo- calization share similar principles, existing algorithms mostly treat them as independent tasks, hence necessitating proper detectors. However, there may not be a clear distinction between the far- and near-ﬁeld scenarios, as in practice sources may have both far- and near-ﬁeld characteristics [21]. Additionally, the criterion to distin- guish between near- and far-ﬁeld soundﬁelds, e.g. in [22], depends on the frequency content of the signal, and may therefore require special attention when applied to broadband signals. By contrast, this paper targets at a multi-channel solution that can be applied to both far- and near-ﬁeld scenarios, circumventing the need to differentiate between soundﬁeld. To achieve this goal, we propose a new spatial feature, which is a generalization of the rel- ative harmonic coefﬁcients we have introduced in [3, 23, 24], hence denoted generalized RHC. In this paper, we ﬁrst derive the analytical expression of this feature. Then, we describe its unique properties, leading to two new single-source DOA estimators: the ﬁrst applies a grid search over the directional space, and the second applies a closed-form solution. Finally, we carry out an extensive experimen- tal study, under both near- and far-ﬁeld scenarios, to validate the applicability of the proposed algorithms in realistic noisy and rever- berant environments. 2. SYSTEM MODEL 2.1. Space domain Model Assume a multi-channel microphone array, such as a M-channel spherical microphone array with polar coordinates xj =(θj ,φj ,r), j =1,...,M, where θj ,φj ,r denote the elevation, azimuth and microphone radius with respect to the local origin O. Assume a sound source located at an unknown position Ψs =(ϑs,ϕs,rs), with elevation θs, azimuth φs and source-array distance rs. Hence, the sound pressure, as measured by the j-th microphone, is given by P (xj ,k)= S(k)G(xj ,k)+ V (xj ,k) (1) where k =2πf/c is the wavenumber, f is the frequency bin, c is the speed of sound, P (xj ,k) denotes the received pressure, S(k) de- notes the source signal, G(xj ,k) denotes the acoustic transfer func- tion from the source to the j -th microphone and V (xj ,k) denotes the additive and non-directional noise. ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 978-1-7281-6327-7/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICASSP49357.2023.10095371