WPD-based Noise Suppression Using Nonlinearly Weighted Threshold Quantile Estimation and Optimal Wavelet Shrinking Tuan Van Pham and Gernot Kubin. Signal Processing and Speech Communication Laboratory University of Technology, Graz, Austria v.t.pham@tugraz.at ; g.kubin@ieee.org ; http://spsc.tugraz.at Abstract A novel speech enhancement system based on wavelet packet decomposition (WPD) is proposed. Noise level is estimated based on quantile in wavelet threshold domain. To handle colored and non-stationary noises, the universal thresholds are weighted by a time-frequency dependent nonlinear function. Two nonlinear weighting methods using temporal threshold variation and kernel smoothing are proposed. The weighted thresholds are smoothed and employed for wavelet shrinking with an adaptive factor to compress noise while preserving speech quality. The proposed system is evaluated and compared with other algorithms based on spectral subtraction via objec- tive measures and subjective tests to demonstrate its superior performance. 1. Introduction One of the challenges in speech enhancement is noise estima- tion which is very difﬁcult for non-stationary noise. With the common spectral subtraction method in [1], the noise spectrum is calculated from non-speech frames which are detected previ- ously by voice activity detection (VAD). The so-called ”musical noise” related to short-time spectral subtraction remains after the processing and has a very unnatural and disturbing qual- ity. The optimal non-linear spectral amplitude estimation in [2] achieves a signiﬁcant noise reduction while reducing the musi- cal noise and maintaining good speech quality. The disadvantage of these approaches is low temporal con- sistency because the VAD is unreliable and the noise esti- mate cannot be updated during speech periods. A new method called quantile-based noise estimation in [3] which is based on minimum statistics [4] overcomes the problems of VAD. In 1995, Donoho [6] proposed the wavelet shrinking method as a powerful tool to enhance noisy signals. However, the soft- thresholding does not have high-order derivatives and results in some artifacts in the denoised signal. More sophisticated shrinking functions have been studied in [7, 8] to improve the denoised signal quality. In this paper, the denoising process is considered in a se- quence of buffers consisting of the overlapping speech frames. The wavelet coefﬁcients of 128 WPD channels are extracted by performing the WPD on each speech frame at the 7 th analy- sis level. Next, these coefﬁcients are thresholded by optimal wavelet shrinking which is controlled by a nonlinear mecha- nism. Then the enhanced speech frames are reconstructed by the wavelet packet reconstruction (WPR) of denoised coefﬁ- cients (see Fig. 1). For every WPD channel of all frames in the overlapping buffer, the universal thresholds determined in [6] are calculated. Then the thresholds related to the noise levels are estimated as quantiles in the recursive buffers which store the sorted thresh- olds along each WPD channel. The wavelet shrinking from [8] is applied to shrink the coefﬁcients below the noise thresh- old to zero. The major contributions to the system are two new nonlinear weighting functions and an adaptive factor of the wavelet shrinking to control non-stationary and colored noise in the time-frequency domain. These nonlinear weighting func- tions are based on the temporal threshold variation and kernel smoothing of frame indices after sorting. Finally, the smoothed weighted thresholds are used for optimal wavelet shrinking. noisy speech frames WPD Universal Recursive buffering, threshold and Quantile Smoothing Nonlinear Optimal wavelet shrinking Nonlinear mechanism weighting WPR enhanced speech frames Figure 1: The proposed WPD-based denoising system. 2. Wavelet packet shrinking 2.1. Wavelet packet decomposition Any signal s[n] with length N in space V0 can be decomposed into an approximation and a detail in spaces V1 and W1, respec- tively. These spaces can be further splitted into smaller spaces {Vj } and {Wj },j ∈ Z. In this research, we apply a full WPD which is represented by a full binary tree. Each node {j, k} at a deeper level is covered by the subspace {W j,k } where j denotes the decomposition level (j< log 2 N ), k ∈ Z. This subspace is spanned by an orthonormal basis {ψ j,k (-2 j n + m)} m,n∈Z . The WPD coefﬁcients of the signal s[n] is: D j,k [m]= 〈s[n],ψ j,k (-2 j n + m)〉 (1) The proposed denoising system is excuted on all WPD channels at j =7 and k =1,..., 2 j . From now on, j is dis- carded in the notation. Using the full WPD constitutes a com- putational load but gains a high accuracy of the decomposition. The Daubechies wavelet db40 is selected as a good choice be- cause of its high performance in an informal listening test. 2.2. Wavelet packet shrinking The universal threshold rule [6] using a robust estimate of stan- dard deviation is applied to estimate the thresholds of the WPD coefﬁcients as: T k,i = 1 γMAD Median(|D k,i |) p 2 log N k,i , (2) 10.21437/Interspeech.2005-682