XXXXXXXX
Dysphonic Speech
Reconstruction
V
erbal communication is one of the most influential
and effective way of social communication. Voiced
sounds are produced when the vocal cords vibrate;
thus, the flow of air from the lungs to the vocal tract
interrupts, and quasi-periodic pulses of air are produced dur-
ing excitation. Dysphony is a functional disorder of larynx as
a result of pathologic vibration in vocal cords. Chronic dys-
phonia occurs in the presence of organic lesions (such as
polyp, nodule, and Reinke’s edema) in the vocal cords, lethal
larynx diseases, throat cancer, neurological disorders, and
chronic irritation due to smoking. The breathed air, which is
send to the trachea to produce voice, could make the vocal
cords to vibrate barely or not at all because of the pathological
formation in the vocal cords of the patients. As a result, voice
comes out as a low whisper and more cracked than usual.
After total laryngectomy, there are three well-established
methods to fix the voice. The first one is alaryngeal speaking.
Through this method, the patient can speak by using an elec-
trolarynx. The second method is training the patient and help-
ing him speak in esophageal speech. The third method is
speaking by using tracheoesophageal voice prostheses. Even
though the speech is not as qualified as the previous, patients
can speak through artificial larynx, voice prosthesis, or
esophageal speech. However, these methods cannot be applied
to the patients with apoplectic chordae vocalis, organic lesions
of vocal cords, or who suffer from dysphony due to a partial
laryngectomy in which some parts of the larynx and vocal
cords are removed. Solutions such as voice therapies and/or
operations to help patients to speak again may not work at all.
Several systems that analyze and enhance the characteris-
tics of the esophageal speech and speaking using electrolarynx
have been designed so far [1]–[5]. However, there is no
reported research in the literature that produces synthetic
voice digitally based on the patients’ voice in cases where the
patients were treated with partial laryngectomy or had com-
pletely lost speech as a result of organic lesions on the vocal
cord or of vocal-cord paralysis.
In this article, we present a novel system that delivers syn-
thetic speech with a quality close to natural by reconstructing
dysphonic speech. We believe that it will be an important
improvement in the social patients for effective and efficient
communication.
Acoustic Characteristics of Dysphonic Speech
Chronical dysphonia mainly occurs because of the malfunc-
tioning of the vocal cords. Voice formed this way demon-
strates whisperlike characteristics. Dysphonic speech differs
from normally phonated speech in terms of voicing, pitch, and
formant structure. Spectrograms of normal and dysphonic
speech for the Turkish word ‘‘C ¸ alı+ma’’ (IPA codes of charac-
ter c ¸ ¼ t
R
and + ¼
R
) are given in Figure 1.
Figure 1 clearly shows that, contrary to the voiced pho-
nemes of normal speech, there is no perceivable pitch period
or voicing observed in the voiced phonemes of dysphonic
speech. In addition to this, voiced phonemes of dysphonic
speech differ from the voiced phonemes of normal speech in
terms of formant distortion. Bandwidths of dysphonic pho-
nemes are larger, and their formant frequencies are greater.
However, in unvoiced phonemes of dysphonic speech, there is
no significant formant distortion observed [5]. Differences
between dysphonic speech and normal speech are summarized
in Table 1 in terms of pitch, voicing, and formant distortion
characteristics.
According to Table 1, it was determined that no modifica-
tion should be done for the unvoiced phonemes of a dysphonic
speech.
Data Collection
The voices of dysphonic patients come out as whispers
because their vocal cords cannot function properly. On the
other hand, evaluating both the dysphonic voice and its origi-
nal form before the disorder is essential to choose the appro-
priate method for normal speech reconstruction. Since
accessing a dysphonic patients’ original voice recordings is
rather difficult, normal voices and whispers of healthy speak-
ers were used to choose the proper method. For this purpose, a
database consisting of recordings of normal voices and whis-
pers of 30 men and 20 women speakers aged 25–50 was
established.
There is no public database of dysphonic speech in litera-
ture; so, a dysphonic speech database containing 22 patients’
speech recordings was created to appraise the success of the
BY H. IREM TURKMEN
AND M. ELIF KARSLIGIL
Construction of a Novel System for an Effective
and Efficient Communication
Digital Object Identifier 10.1109/MEMB.2009.000000
IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE 0739-5175/10/$26.00©2010IEEE MARCH/APRIL 2010 1
IEEE
Proof