International Journal of Innovation and Applied Studies ISSN 2028-9324 Vol. 12 No. 1 Jul. 2015, pp. 33-61 © 2015 Innovative Space of Scientific Research Journals http://www.ijias.issr-journals.org/ Corresponding Author: Md. Shahadat Hossain 33 HUMAN VOICE ACTIVITY DETECTION USING WAVELET Md. Shahadat Hossain, Ariful Islam, and Dr. Md. Rafiqul Islam Mathematics Discipline, Science Engineering and Technology School, Khulna University, Khulna-9208 Copyright © 2015 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT: Wavelet has wide range of use in the present scientific universe. At present using wavelet through MATLAB different types of tasks are done. For instance biometric recognition (fingerprint recognition, voice recognition, iris recognition, face recognition, pattern recognition and signature recognition), signal processing, human voice activity detection etc. are done using wavelet and wavelet transform. Among these here I have discussed about “Human Voice Activity Detection”. At first a human voice is taken as the input sound to MATLAB command window using a good headphone for a few second. Then the sound taken as input give a graphical representation that is saved for future activities. After that using the wavelet toolbox of MATLAB the image of the input sound is taken for analyzing it. Using discrete wavelet transform the image is analyzed. During this analysis a “10 level wavelet” tree is generated by Haar wavelet with 10 decomposition level. At the same time the original signal is reconstructed. At the first time six different human voice activities of the same persons are analyzed. The Norm and the SNR (Signal to Noise Ratio) are counted. The data of the SNR are counted in decibel (db.) unit. Also the bit rates of the three different voice are counted. In this way total 18 different experiments are done for the different five persons where except the first person for all the person three experiments are dine.. The numerical data of the experiments are shown as graphical representation as well as in histogram analysis. In this process the whole experiments are done for the activity detection of human voice. KEYWORDS: Wavelet, SNR, Bit rate, Human voice, Histogram. 1 INTRODUCTION Recently, human-machine interface system based on speech attracts much interest, supporting with the rapid improvement of the CPU performance. The speech-based interface is greatly based on speech recognition, in which the information of voice activity segments (VAS) is effective to improve the recognition rate. For the voice activity detection, various methods have been proposed. They use the features of speech signal, such as transition of the power [1], harmonic structure in spectrum [2] [11] [3] and the existence of signal source directionality [4]. In these methods, acquired speech is usually assumed to be sufficiently clean, due to the preprocessing used in speech recognition and compression for transmission. However at indoor environments where the interface is ordinarily used, there are various localized interferences arriving from particular direction such as the sound of closing door, etc. For these non-stationary interferences, the conventional methods do not realize sufficient performance, because of stationarity and whiteness assumption to noise. Kaneda [10] [12] proposed an effective VAD method available for these non-stationary interferences, using their high performance speech emphasizing system”AMNOR (Adaptive Microphone array for Noise Reduction)”. He uses microphone array to discriminate signals utilizing direction difference between speech and interference. However, target speech and interference are required to arrive from sufficiently separated direction due to the spatial resolution in AMNOR. This limitation critically restricts the applicable condition of the method. In this research, we propose a new method to be robust to the direction of interference, with microphone array signal processing in the wavelet domain to integrate the time, frequency and spatial information of speech signal.