INFORMATION AND COMMUNICATION TECHNOLOGIES AND SERVICES VOLUME: 10 | NUMBER: 4 | 2012 | SPECIAL ISSUE
270 © 2012 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING
FUNDAMENTAL FREQUENCY EXTRACTION METHOD USING
CENTRAL CLIPPING AND ITS IMPORTANCE FOR THE
CLASSIFICATION OF EMOTIONAL STATE
Pavol PARTILA
1
, Miroslav VOZNAK
1
, Martin MIKULEC
1
, Jaroslav ZDRALEK
1
1
Department of Telecommunications, Faculty of Electrical Engeneering and Computer Science,
VSB–Technical University of Ostrava, 17. Listopadu 15, 708 33 Ostrava-Poruba, Czech Republic
pavol.partila@vsb.cz, miroslav.voznak@vsb.cz, martin.mikulec@vsb.cz, jaroslav.zdralek@vsb.cz
Abstract. The paper deals with a classification of
emotional state. We implemented a method for extracting
the fundamental speech signal frequency by means of a
central clipping and examined a correlation between
emotional state and fundamental speech frequency. For
this purpose, we applied an approach of exploratory data
analysis. The ANOVA (Analysis of variance) test
confirmed that a modification in the speaker's emotional
state changes the fundamental frequency of human vocal
tract. The main contribution of the paper lies in
investigation, of central clipping method by the ANOVA.
Keywords
Central clipping, DC offset, emotional state, features
extraction, hamming smoothing window,
homoscedasticity, pre-emphasis.
1. Introduction
Man-machine interaction is a desirable trend,
accompanied hand in hand with an effort to improve the
quality of mutual communication. On the other hand, we
feel the absence of credibility of information presented by
a synthetic speech from a computer’s loudspeaker.
Speeches generated by Text-to-Speech tools act
artificially because they do not take into account the
emotional state.
In speech, the emotional state is characterized by
specific phonetic features. These features include
intensity, intonation and timbre of speech. In the domain
of speech processing, the speech signals are described by
parameters such as signal energy, zero crossing ratio and
fundamental speech frequency or by cepstral coefficients
[1], [2], [3].
2. Pre-processing
Once human speech is digitalized, the digital audio
record can be analyzed. In order to extract signatures
such as the fundamental speech signal frequency, energy,
etc., it is necessary to carry out several operations
depicted in Fig. 1. These steps need to be carried out
before the above-mentioned signatures have been
extracted [4].
DC Offset
Preemphase Segmentation Windowing
Fig. 1: Pre-processing of speech signals.
2.1. DC Offset
A number of audio cards add DC (Direct Current)
components into the audio signal, as depicted in Fig. 2.
Approaches used in digital signal processing are applied
to compute some signatures. The DC component in the
signal negatively affects the computation and may cause
disturbance.
Fig. 2: Effect of DC Offset on speech signal.
It is therefore necessary to remove the DC
component before the processing. The DC component of