Current Trends in Signal Processing
Volume 2, Issue 1, April 2012, Pages 11-25
_________________________________________________________________________________________
ISSN: 2277–6176© STM Journals 2012. All Rights Reserved Page 11
Analysis of Speech Using Different Methods
Prof. S. China Venkateswarlu
1
*, Dr. K. Satya Prasad
2
, Dr. A. SubbaRami Reddy
3
1
Research Scholar,
2
Rector, JNTUK, Kakinada, India
3
Dean, LBRCE, Mylavaram, India
*Author for Correspondence E-mail: cvenkateswarlus@gmail.com
1. INTRODUCTION
The problem of automatically separating music
signals from speech signals has been extensively
studied. In general, approaches to this problem
consider a small set of features to be extracted
from the input signals. These features are
carefully chosen to emphasize signal
characteristics that differ between speech and
music. This project combined two
well-established features used to distinguish
speech and music, as well as a third more novel
feature. Once the typical values of these features
were defined by a set of training data, a decision
system for classifying future samples was
chosen. Here, a simple k-nearest neighbor
algorithm was implemented to determine
whether an incoming sample should be
considered speech or music. The implementation
considered here treats each sample as a whole
and labels the entire sample as either speech or
music [1–3].
2. SIGNAL FEATURES
A large number of signal features have been
employed for the problem of discriminating
speech and music. This paper used two
well-established features, the zero-crossing rate
(specifically the variance of this rate) and the
percentage of low-energy periods relative to the
RMS value of the signal. The third feature used
was a novel measurement of the residual error
signal produced by linear predictive coding.
These three features were combined to improve
the robustness of the classification system and to
hopefully balance out any ambiguities in any
single feature set.
2.1. Variance of Zero-Crossing Rate
For this feature, the number of time-domain
ABSTRACT
In speech analysis, the voiced-unvoiced decision is usually performed in extracting the information from the
speech signals. In this paper, two methods are performed to separate the voiced and unvoiced parts of the
speech signals. These are zero-crossing rate (ZCR) and energy. In here, we evaluated the results by dividing
the speech sample into some segments and used the zero-crossing rate and energy calculations to separate
the voiced and unvoiced parts of speech. The results suggest that zero-crossing rates are low for voiced part
and high for unvoiced part whereas the energy is high for voiced part and low for unvoiced part. Therefore,
these methods are proved effective in separation of voiced and unvoiced speech.
Keywords: Implementation, low-energy, root mean square (RMS) power, speech, variance, voiced,
unvoiced, ZCR: zero-crossing rate