BLIND SYNCHRONIZATION IN WIRELESS SENSOR NETWORKS WITH APPLICATION TO SPEECH ENHANCEMENT Dani Cherkassky and Sharon Gannot Faculty of Engineering Bar-Ilan University Ramat-Gan, 5290002, Israel dani.cherkassky@gmail.com ; sharon.gannot@biu.ac.il ABSTRACT The sampling rate offset (SRO) phenomenon in wireless acoustic sensor network (WASN) is considered in this work. The use of different clock sources in each node results in a drift between the nodes’ signals. The aim of this work is to estimate these SROs and to re-synchronize the network, enabling coherent multi-microphone processing. First, the link between SRO and the Doppler effect is driven. Then, a wideband correlation processor for SRO estimation, which is equivalent to continuous wavelet transform (CWT), is proposed. Finally, the node synchronization is achieved by re-sampling the signals at each node. Experimental study using an actual WASN, demonstrates the ability of the pro- posed algorithm to re-synchronize the network and to regain the performance loss due to SRO. Index Terms— Blind synchronization, Wireless acoustic sensor network, Sampling rate offset, Wideband correlation processing 1. INTRODUCTION Sensor network is found in a wide range of applications, in- cluding in speech processing tasks, e.g. localization, tracking and speech enhancement [1, 2]. In recent years, the concept of WASN with a large number of arbitrarily deployed sensors has attracted the attention of the speech processing commu- nity. Along with the clear advantages offered by WASNs, some new challenges are raised. The demand for real-time acquisition and streaming of the audio data imposes severe constraints on the network layer. From the signal processing perspective, the major chal- lenges are [2, 3]: unknown array geometry, distributed pro- cessing, scalability and synchronization. The latter challenge is addressed in the current contribution. As the sampling process in each node in a WASN relies on its local clock source, SROs are inevitable. In the scope of this work, we are aiming at analyzing the effect of synchronization on the performance of a speech enhancement algorithm, and at de- veloping a method for mitigating the performance degrada- tion. We concentrate on signal processing techniques, rather than hardware-based solutions. Pawig et al. [4] consider the SRO between input and out- put channels in a single channel echo cancelation system, by using a reference signal for the estimation of the off- set. Wehr et al. [5] consider the synchronization problem in distributed beamforming for blind source separation (BSS). They propose an algorithm for estimating the SROs based on a modulated reference signal broadcast in the WASN. Miyabe et al. [6] also considered an asynchronous microphone array with application to BSS. They proposed a blind technique for SRO compensation, based on the approximation of the SRO as a time-varying delay and calculated the maximum likeli- hood estimator of this delay in the short-time Fourier trans- form (STFT) domain. The former approximation was also used by Markovich et al. [7] in the context of asynchronous minimum variance distortionless response (MVDR) beam- former for speech enhancement. They proposed a method for estimating the SRO using voice activity detector and the features of the noise covariance matrix in the STFT domain. In the current contribution an asynchronous sensor array is considered. The SRO between sensors is linked to the well-known Doppler effect. We tackle the synchronization problem by applying a wideband correlation processor [8], for SRO estimation, and subsequently applying a re-sampling procedure that utilizes a polynomial interpolation. The rest of the paper is organized as follows. In Sec. 2, the problem is formulated. In Sec. 3, a wideband correlation processor is presented. In Sec. 4 the proposed synchroniza- tion method is presented and analyzed. The performance of the proposed synchronization method is evaluated in Sec. 5. We conclude this paper by a short discussion in Sec. 6. 2. PROBLEM FORMULATION Consider a desired and interference speech sources impinging on an array of M microphones. The microphones signals are further corrupted by a spatially white sensor noise. Denote the desired source s d (t), the interfering source s i (t), and the