Cybernetics and Systems Analysis, Vol. 49, No. 4, July, 2013
APPLIED ASPECTS OF THE SYNTHESIS
AND ANALYSIS OF VOICE INFORMATION
Iu. V. Krak,
a†
Iu. G. Kryvonos,
a‡
and A. I. Kulias
a‡
UDC 004.8
Abstract. The authors present new results in solving problems of concatenative segment synthesis of
voice information with prosody and vocal utterance, computer modeling of human voice signals based
on joint models of human voice source and vocal tract, and speech signal preprocessing for automated
documenting systems. The experiments show the efficiency of the proposed approaches.
Keywords: synthesis, concatenation, voice signal, automated documenting.
INTRODUCTION
Subsystems of computer synthesis of vocal information are an integral part of modern human–machine interface. They
are widely used in high-intelligent multimedia technologies, education program design, virtual media, library and other
manuals, in web-systems and IP-telephony telecommunication systems, applications for people with special needs, etc.
Problems of the synthesis and recognition of vocal language signals, modeling human vocal apparatus, automation of computer
documenting of audio information are actively analyzed and successfully solved all over the world and particularly in Ukraine.
Synthesis of natural languages is an important functional part of artificial intelligence systems since it admits a human friendly
way of communication. Developing the methods of synthesis with the maximum use of speech characteristics and allowance
for the prosody and intonational properties of natural language is important in this field of studies.
Creating a computer articular synthesizer of national languages with application of mathematical methods of sound
modeling is a significant part of studies in obtaining artificial voice data. In such an approach, it is necessary to combine
physical models of a vocal source and human vocal apparatus and to develop mathematical methods and numerical
algorithms to solve acoustic problems.
Automated documenting, such as meeting transcription, is a necessary component of work in many organizations. As
a rule, the process of shorthand report creation and transcription is time-consuming, and attempts of its acceleration by
increasing the staff are inefficient. To solve such problems, it is necessary to develop methods for preliminary preparation of
the original information (partition into segments, noise elimination, sound enhancing) used then to create distributed
computer documentation systems.
The purpose of the present study is to analyze and develop new mathematical methods and to upgrade the available
approaches to the practical solution of problems of automated synthesis and analysis of voice information.
CONCATENATIVE SYNTHESIS OF VOICE INFORMATION
Speech synthesis systems can be classified according to the ways of obtaining vocal signal [1, 2]. There are three
main methods of the synthesis: articulatory, formant, and concatenative. In systems of concatenative synthesis, output
acoustic signal is constructed based on successive concatenation of necessary elements of the synthesis. The main objective
of the synthesis of natural speech is to design algorithms of information sounding-on with the greatest approximation of the
sounding characteristics to the human voice. The concatenation is determined by the structure and content of the database of
synthesis elements; therefore, the higher the quality of synthesis, the higher the dimension of its element base. There are
several standard approaches to choosing the concept of formation of minimum elements of the synthesis: phones,
589
1060-0396/13/4904-0589
©
2013 Springer Science+Business Media New York
a
V. M. Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine,
†
Yuri.krak@gmail.com and krak@unicyb.kiev.ua;
‡
aik@public.icyb.kiev.ua. Translated from Kibernetika i Sistemnyi
Analiz, No. 4, July–August, 2013, pp. 120–129. Original article submitted January 24, 2013.