Cybernetics and Systems Analysis, Vol. 49, No. 4, July, 2013 APPLIED ASPECTS OF THE SYNTHESIS AND ANALYSIS OF VOICE INFORMATION Iu. V. Krak, a† Iu. G. Kryvonos, a‡ and A. I. Kulias a‡ UDC 004.8 Abstract. The authors present new results in solving problems of concatenative segment synthesis of voice information with prosody and vocal utterance, computer modeling of human voice signals based on joint models of human voice source and vocal tract, and speech signal preprocessing for automated documenting systems. The experiments show the efficiency of the proposed approaches. Keywords: synthesis, concatenation, voice signal, automated documenting. INTRODUCTION Subsystems of computer synthesis of vocal information are an integral part of modern human–machine interface. They are widely used in high-intelligent multimedia technologies, education program design, virtual media, library and other manuals, in web-systems and IP-telephony telecommunication systems, applications for people with special needs, etc. Problems of the synthesis and recognition of vocal language signals, modeling human vocal apparatus, automation of computer documenting of audio information are actively analyzed and successfully solved all over the world and particularly in Ukraine. Synthesis of natural languages is an important functional part of artificial intelligence systems since it admits a human friendly way of communication. Developing the methods of synthesis with the maximum use of speech characteristics and allowance for the prosody and intonational properties of natural language is important in this field of studies. Creating a computer articular synthesizer of national languages with application of mathematical methods of sound modeling is a significant part of studies in obtaining artificial voice data. In such an approach, it is necessary to combine physical models of a vocal source and human vocal apparatus and to develop mathematical methods and numerical algorithms to solve acoustic problems. Automated documenting, such as meeting transcription, is a necessary component of work in many organizations. As a rule, the process of shorthand report creation and transcription is time-consuming, and attempts of its acceleration by increasing the staff are inefficient. To solve such problems, it is necessary to develop methods for preliminary preparation of the original information (partition into segments, noise elimination, sound enhancing) used then to create distributed computer documentation systems. The purpose of the present study is to analyze and develop new mathematical methods and to upgrade the available approaches to the practical solution of problems of automated synthesis and analysis of voice information. CONCATENATIVE SYNTHESIS OF VOICE INFORMATION Speech synthesis systems can be classified according to the ways of obtaining vocal signal [1, 2]. There are three main methods of the synthesis: articulatory, formant, and concatenative. In systems of concatenative synthesis, output acoustic signal is constructed based on successive concatenation of necessary elements of the synthesis. The main objective of the synthesis of natural speech is to design algorithms of information sounding-on with the greatest approximation of the sounding characteristics to the human voice. The concatenation is determined by the structure and content of the database of synthesis elements; therefore, the higher the quality of synthesis, the higher the dimension of its element base. There are several standard approaches to choosing the concept of formation of minimum elements of the synthesis: phones, 589 1060-0396/13/4904-0589 © 2013 Springer Science+Business Media New York a V. M. Glushkov Institute of Cybernetics, National Academy of Sciences of Ukraine, Kyiv, Ukraine, † Yuri.krak@gmail.com and krak@unicyb.kiev.ua; ‡ aik@public.icyb.kiev.ua. Translated from Kibernetika i Sistemnyi Analiz, No. 4, July–August, 2013, pp. 120–129. Original article submitted January 24, 2013.