International Journal of Computer Applications (0975 8887) Volume 154 No.5, November 2016 34 A Study on Detection of Intonation Events of Assamese Speech Required for Tilt Model Parismita Sarma Research Scholar Department of Information Technology Gauahti University, Guwahati, Sikhar Kumar Sarma Professor Cotton College State University Guwahati, Assam, India ABSTRACT This paper has done a study and experimental analysis on different intonation events of Assamese speech. Assamese is a North East Indian language and spoken by lacks of people in India. The researchers need intonation model to identify language specific intonation events, which are necessary for synthesis process of that particular language. The paper shows outcomes of some experiments done with different speech software and observes the intonation behavior like fundamental frequency, duration and boundary tone on segment of Assamese utterances. This paper is on Tilt intonation model and phonology of Assamese language which is necessary for speech synthesis. Some unique characteristics of Assamese utterances and syllable level behavior are also described in this paper. Keywords Intonation, speech synthesis, fundamental frequency, duration 1. INTRODUCTION In the process of text to speech synthesis of a language we have to prepare prosody. Phonetic module preparation is a phase in speech synthesis and prosody have to generate in this phase. Prosodic modeling involves generation of F0 contour, accent prediction, prosodic phrasing etc[1]. There are two tasks associated with intonation prediction. They are accent prediction and F0 contour realization. Starting from initial intonation, many improvement taking place regarding supervised and unsupervised model of intonation. Quality of synthesis speech is growing day by day and at the same time, demand of more sophisticated intonation model is also increasing. Synthesis of Assamese language is based on unit selection concatenate synthesis method. Regarding this synthesis process more concern is given on the fundamental frequency (F0) of units and duration values of them. These parameters are rationally necessary to get natural sounding speech[2]. Tilt model helps to identify these phonemic characteristics. Without rhythm or tone any kind of speech becomes monotonic, just speaking like a robot. To get natural sounding speech, intonation of rhythm of individual units should be taken care of. Phonetic module is a phase of speech synthesis procedure, it entails phonemic significance of script[3]. Phonetic module also includes prosodic modeling. The module deals with F0 prediction and contour generation, prosodic phrase and accent. Tilt is a tool used for event detection in speech synthesis process. Tilt converts different intonation events to some parametric sequential events with help of some conversional methods. The model consists of a collection of functions that form a library. Fundamental functions of tilt are analysis of text, synthesis and rectification or modification of the values received from raw speech wave. 2. ASSAMESE LANGUAGE Assam is situated in the north east part of India. Assamese is largely spoken language in that region. Origin of Assamese language lies mostly in Indo-European family. Dr. B. K. Kakoti mentioned in his PhD. thesis that there are a huge number of words exported from Indo Chinese family. This language is also associated with Indo-Aryan family[4]. Eight numbers of vowel phonemes, twenty one consonant phonemes and many diphthongs are there in the language[5]. Assamese has a number of characteristics of its own. It has no any retroflex pronunciation, but this sound is very much found in southern part of India. Velar nasal /ŋ/ is frequently found in many words of Assamese language. /w/ phoneme is extensively used in Assamese language. Most important feature of Assamese is that use of velar fricative /x/. This phoneme is totally absent in any other Indian languages. 3. SOME INTONATIONAL CHARACTERISTICS 3.1 Tone Tone can be defined as assets of a syllable. In other words it is a type of pitch movement. Ancient Indian Vedic languages were tonal languages[6]. Tone and its specific structure is able to express meaning of a sentence. Sentence, phrases even individual word can carry different meaning depending upon tone involved in them. Tone is prominent on syllable of a word. Application of different types of tone to China Tibetan languages is unique characteristics of those languages. In a sentence or word some syllabi become more prominent due to application of tone and are capable of expressing different meaning. There are four types of tones found in different context of a sentence. Vocal cord vibration has crucial impact upon intonation of a language. When vocal cord vibrates and it grows up to maximum that type of tone is called as rising tone or acute. When residual vowel goes on decreasing and decreasing then that type of tone is called as falling tone. Sometimes vowel sounds rise up suddenly and immediately goes down, it is called as circumflex tone[7]. On the other hand neutral tone, which does not rise or fall is called as level tone. Intonation model has to predict speech parameters from the written text. Written text does not have any information regarding stressed syllable or F0 contour. Important thing is that prediction about accented syllable and type of accent should be correct to design a good intonation model. Next prediction is about F0 contour generation. If accent or tone is known, generation of F0 contour is not difficult. For example the Assamese sentence “ৰাতুল আজি আজিব (Ratul will come today) can be uttered in three different moods. In normal mood sentence gives an information that Ratul will come. The F0 contour for normal mood is shown in fig. 1. If the same sentence is converted for question asking, F0 contour will be as shown in fig. 2. It is seen that the normal