The Preliminary Results of A Mandarin nictation Machine Based Upftn Chinese Natural Language Analysis Lin-shan Lee***, Chiu-yu Tseng**'***, KJ. Chen**, and James Huang**** * Dept of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, Rep. of China. Tel: (02) 392-2444 ** The Institute of Infonnation Science, Academia Sinica, Taipei, Taiwan, Rep. of China. *** The Institute of History and Philosophy, Academia Sinica, Taipei, Taiwan, Rep.of China. ****Dept of Modern Languages and Linguistics, Cornell University, Ithaca, N.Y, USA. Abstract This paper describes the preliminary results of the first research effort toward a Mandarin dictation machine in the world for the input of Chinese characters to computers. Considering the special characteristics of Chinese language, syllables are chosen as the basic units for dictation. The machine is divided into two subsystems. The first is to recognize the syllables using speech signal processing techniques. Because every syllable can represent many different characters with completely different meaning, the second subsystem then identifies the exact characters from the syllables and corrects the errors in syllable recognition by first forming all possible words from the syllables then finding out one combination of the words which is grammatically valid in a sentence. The preliminary test results indicate that such a dictation machine is not only practically attractive, but technically achievable. 1. Introduction Today, the input of Chinese characters into computers is still a very difficult and unsolved problem, which is the basic motivation for the development of a Mandarin dictation machine. We define the scope of the research by the following limitations. The input speech is in the form of isolated syllables instead of continuous speech (The choice of syllables as the dictation unit will be discussed in detail later.). The machine is speaker dependent. The first stage goal of this system is to have 90% correction for the sentences in the Chinese textbooks of the primary schools in Taiwan, Rep. of China. The errors can be found by the user on the screen and corrected from the keyboard. Such a performance is still much more efficient than any of the currently existing input systems. Also, only a small dictionary for demonstration purpose is to be established in the first stage. The machine will work well for sentences formed by the words in the dictionary, otherwise the new additional words have to be keyed into the dictionary. To our knowledge, this is the first research effort toward a Mandarin dictation machine in the world. The dictation machine is divided into two subsystems. The first one is to recognize the syllables using speech signal processing techniques, but this is not very helpful at all because in general every syllable can represent many different characters and can possibly form different multi-syllabic words with syllables on its right or left. Therefore the second subsystem is to identify the correct characters from the syllables by forming correct words which is grammatically valid in a sentence and carefully considering the characteristics of the Chinese natural language. IL Considerations for the Special Structure of Chinese Language There are at least 80 thousands of commonly used words in Chinese. Therefore the words can not be used as the dictation units. There are at least 20 thousands of commonly used Chinese charaters, each character is mono-syllabic. Each word is composed of from one to several characters. A nice feature is that the total number of different syllables in Mandarin speech is only about 1300. If we use the 1300 syllables as the dictation units, all the words or characters will be covered. However, the small number of syllables implies another difficult problem, that is, many different characters will share the same syllable. This is why we need the second subsystem. Based on the above observations on the special structure of Chinese language, the use of syllables as the dictation unit becomes a very natural choice. Another very special important feature of Mandarin Chinese language is the tones for the syllables. Every character is assigned a tone in general. There are basically four different tones. It has been shown * that the primary difference for the four tones is in the pitch contours, and the tones are essentially independent of the other acoustic properties of the syllables. If the differences among the syllables due to lexial tones are disregarded, only 411 syllables are required to represent all the pronunciations for Mandarin Chinese. This means the recognition of the syllables can be divided into two parallel procedures, the recognition of the tones, and of the 411 syllables disregarding the tones. TIL The Overall System Structure Based on the considerations described above, the overall system structure for the Mandarin dictation machine is shown in Fig.l. The system is basically divided into two subsystems. The first is to recognize the syllables, and the second is to transform the scries of syllables into the characters. For the first subsystem of syllable recognition, the corresponding syllable (disregarding the tones) and the tone are then recognized independently in parallel. Because errors always happen, we therefore have to provide information for confusing syllables, and confusing tones. For the second subsystem we need to first form multi-syllabic words from each syllables. To use the above example, although there are many characters all correspond to the syllable [guo-2] and many to [iu-3], there is only one multi-syllabic word " Qfltl (Mandarin)" has the pronunciation" [quo-2] [iu-3]", etc.. But this doesn't solve the problem well. First, mono-syllabic words, such as [ni-3], [shr-4], [i-2], [jia-4], [huei-4], [tieng-1] can't be identified in the above way. They even form ambiguous multi-syllabic words, for example, the syllable [i-2] Lea, Tseng, Chan, and Huang 619