EXTENSION OF PRE-IMAGE SPEECH DE-NOISING BY VOICE ACTIVITY DETECTION USING A BONE-CONDUCTIVE MICROPHONE Christina Leitner, Franz Pernkopf Signal Processing and Speech Communication Lab Graz University of Technology Inffeldgasse 16c, 8010 Graz, Austria ABSTRACT In this paper, we use voice activity detection to improve the de-noising ability of the previously proposed pre-image iter- ation speech enhancement method. We use a speech database consisting of two-channel recordings where the audio sig- nal is recorded by both a bone conductive microphone and a close-talking microphone. The bone channel is used for voice activity detection as it can be assumed to be robust against en- vironmental noise. The pre-image iteration method is prone to residual noise around speech components – we use the voice activity detection to remove this noise. The approach is evaluated using objective quality measures of the PEASS toolbox and shows an increase of the de-noising capability compared to the original method. Index Terms— Speech de-noising, voice activity detec- tion, bone conductive microphone, pre-image iterations 1. INTRODUCTION Subspace methods represent one class of speech enhancement algorithms. Usually they apply principal component analysis (PCA) to de-noise speech. In [1], we used kernel principal component analysis to de-noise speech using complex-valued spectral data. In [2], we derived the so-called pre-image it- eration approach that is a simpliﬁcation of the kernel PCA approach and is equivalent to forming convex combinations of noisy speech samples. This approach is computationally more efﬁcient and introduces fewer artifacts. In this paper, we introduce voice activity detection (VAD) to improve the de-noising performance of pre-image iter- ations as they are prone to residual noise around speech components. We therefore use a database that contains stereo recordings where one channel was recorded using a close- talking microphone of a standard headset and the second channel was recorded by a bone conductive microphone that is integrated in the headset. The bone channel is robust to environmental noise and can therefore improve the perfor- We gratefully acknowledge funding by the Austrian Science Fund (FWF) under the project number S10610-N13. ❅ ❅ ❘ bone microphone Fig. 1. Headset with integrated bone conductive microphone. mance of speech processing applications in noisy environ- ments [3, 4]. The headset was built at SPSC lab [5], Figure 1 shows the prototype. This paper is organized as follows: Section 2 explains how to apply pre-image iterations for speech de-noising. Sec- tion 3 describes the extension of the system with the VAD and gives details on the implementation. Section 4 presents experimental results and Section 5 concludes the paper. 2. PRE-IMAGE ITERATIONS Recently, we showed that pre-image iterations, that are a sim- pliﬁcation of kernel PCA, can be used for speech de-noising [2]. When applying kernel PCA, the data samples are implic- itly transformed to a high-dimensional feature space for pro- cessing. Depending on the type of kernel, the sample in input space corresponding to the sample in feature space after pro- cessing cannot be determined directly, i.e., there is no one-to- one mapping between feature and input space. The samples in input space are called pre-images and the problem of ﬁnd- ing them is called pre-image problem. Several solutions have been proposed to solve the pre-image problem which we sum-