International Journal of Innovation and Applied Studies
ISSN 2028-9324 Vol. 12 No. 1 Jul. 2015, pp. 33-61
© 2015 Innovative Space of Scientific Research Journals
http://www.ijias.issr-journals.org/
Corresponding Author: Md. Shahadat Hossain 33
HUMAN VOICE ACTIVITY DETECTION USING WAVELET
Md. Shahadat Hossain, Ariful Islam, and Dr. Md. Rafiqul Islam
Mathematics Discipline, Science Engineering and Technology School, Khulna University, Khulna-9208
Copyright © 2015 ISSR Journals. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ABSTRACT: Wavelet has wide range of use in the present scientific universe. At present using wavelet through MATLAB
different types of tasks are done. For instance biometric recognition (fingerprint recognition, voice recognition, iris
recognition, face recognition, pattern recognition and signature recognition), signal processing, human voice activity
detection etc. are done using wavelet and wavelet transform. Among these here I have discussed about “Human Voice
Activity Detection”. At first a human voice is taken as the input sound to MATLAB command window using a good headphone
for a few second. Then the sound taken as input give a graphical representation that is saved for future activities. After that
using the wavelet toolbox of MATLAB the image of the input sound is taken for analyzing it. Using discrete wavelet transform
the image is analyzed. During this analysis a “10 level wavelet” tree is generated by Haar wavelet with 10 decomposition
level. At the same time the original signal is reconstructed. At the first time six different human voice activities of the same
persons are analyzed. The Norm and the SNR (Signal to Noise Ratio) are counted. The data of the SNR are counted in decibel
(db.) unit. Also the bit rates of the three different voice are counted. In this way total 18 different experiments are done for
the different five persons where except the first person for all the person three experiments are dine.. The numerical data of
the experiments are shown as graphical representation as well as in histogram analysis. In this process the whole
experiments are done for the activity detection of human voice.
KEYWORDS: Wavelet, SNR, Bit rate, Human voice, Histogram.
1 INTRODUCTION
Recently, human-machine interface system based on speech attracts much interest, supporting with the rapid
improvement of the CPU performance. The speech-based interface is greatly based on speech recognition, in which the
information of voice activity segments (VAS) is effective to improve the recognition rate. For the voice activity detection,
various methods have been proposed. They use the features of speech signal, such as transition of the power [1], harmonic
structure in spectrum [2] [11] [3] and the existence of signal source directionality [4]. In these methods, acquired speech is
usually assumed to be sufficiently clean, due to the preprocessing used in speech recognition and compression for
transmission. However at indoor environments where the interface is ordinarily used, there are various localized
interferences arriving from particular direction such as the sound of closing door, etc. For these non-stationary interferences,
the conventional methods do not realize sufficient performance, because of stationarity and whiteness assumption to noise.
Kaneda [10] [12] proposed an effective VAD method available for these non-stationary interferences, using their high
performance speech emphasizing system”AMNOR (Adaptive Microphone array for Noise Reduction)”. He uses microphone
array to discriminate signals utilizing direction difference between speech and interference. However, target speech and
interference are required to arrive from sufficiently separated direction due to the spatial resolution in AMNOR. This
limitation critically restricts the applicable condition of the method. In this research, we propose a new method to be robust
to the direction of interference, with microphone array signal processing in the wavelet domain to integrate the time,
frequency and spatial information of speech signal.