XXXXXXXX Dysphonic Speech Reconstruction V erbal communication is one of the most influential and effective way of social communication. Voiced sounds are produced when the vocal cords vibrate; thus, the flow of air from the lungs to the vocal tract interrupts, and quasi-periodic pulses of air are produced dur- ing excitation. Dysphony is a functional disorder of larynx as a result of pathologic vibration in vocal cords. Chronic dys- phonia occurs in the presence of organic lesions (such as polyp, nodule, and Reinke’s edema) in the vocal cords, lethal larynx diseases, throat cancer, neurological disorders, and chronic irritation due to smoking. The breathed air, which is send to the trachea to produce voice, could make the vocal cords to vibrate barely or not at all because of the pathological formation in the vocal cords of the patients. As a result, voice comes out as a low whisper and more cracked than usual. After total laryngectomy, there are three well-established methods to fix the voice. The first one is alaryngeal speaking. Through this method, the patient can speak by using an elec- trolarynx. The second method is training the patient and help- ing him speak in esophageal speech. The third method is speaking by using tracheoesophageal voice prostheses. Even though the speech is not as qualified as the previous, patients can speak through artificial larynx, voice prosthesis, or esophageal speech. However, these methods cannot be applied to the patients with apoplectic chordae vocalis, organic lesions of vocal cords, or who suffer from dysphony due to a partial laryngectomy in which some parts of the larynx and vocal cords are removed. Solutions such as voice therapies and/or operations to help patients to speak again may not work at all. Several systems that analyze and enhance the characteris- tics of the esophageal speech and speaking using electrolarynx have been designed so far [1]–[5]. However, there is no reported research in the literature that produces synthetic voice digitally based on the patients’ voice in cases where the patients were treated with partial laryngectomy or had com- pletely lost speech as a result of organic lesions on the vocal cord or of vocal-cord paralysis. In this article, we present a novel system that delivers syn- thetic speech with a quality close to natural by reconstructing dysphonic speech. We believe that it will be an important improvement in the social patients for effective and efficient communication. Acoustic Characteristics of Dysphonic Speech Chronical dysphonia mainly occurs because of the malfunc- tioning of the vocal cords. Voice formed this way demon- strates whisperlike characteristics. Dysphonic speech differs from normally phonated speech in terms of voicing, pitch, and formant structure. Spectrograms of normal and dysphonic speech for the Turkish word ‘‘C ¸ alı+ma’’ (IPA codes of charac- ter c ¸ ¼ t R and + ¼ R ) are given in Figure 1. Figure 1 clearly shows that, contrary to the voiced pho- nemes of normal speech, there is no perceivable pitch period or voicing observed in the voiced phonemes of dysphonic speech. In addition to this, voiced phonemes of dysphonic speech differ from the voiced phonemes of normal speech in terms of formant distortion. Bandwidths of dysphonic pho- nemes are larger, and their formant frequencies are greater. However, in unvoiced phonemes of dysphonic speech, there is no significant formant distortion observed [5]. Differences between dysphonic speech and normal speech are summarized in Table 1 in terms of pitch, voicing, and formant distortion characteristics. According to Table 1, it was determined that no modifica- tion should be done for the unvoiced phonemes of a dysphonic speech. Data Collection The voices of dysphonic patients come out as whispers because their vocal cords cannot function properly. On the other hand, evaluating both the dysphonic voice and its origi- nal form before the disorder is essential to choose the appro- priate method for normal speech reconstruction. Since accessing a dysphonic patients’ original voice recordings is rather difficult, normal voices and whispers of healthy speak- ers were used to choose the proper method. For this purpose, a database consisting of recordings of normal voices and whis- pers of 30 men and 20 women speakers aged 25–50 was established. There is no public database of dysphonic speech in litera- ture; so, a dysphonic speech database containing 22 patients’ speech recordings was created to appraise the success of the BY H. IREM TURKMEN AND M. ELIF KARSLIGIL Construction of a Novel System for an Effective and Efficient Communication Digital Object Identifier 10.1109/MEMB.2009.000000 IEEE ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE 0739-5175/10/$26.00©2010IEEE MARCH/APRIL 2010 1 IEEE Proof