Evaluation of ITU-T G.728 as a Voice over IP codec for Chinese Speech F. L. Chong Department of Computer Science University of Canterbury Christchurch, New Zealand Email: flc16@student.canterbury.ac.nz K. Pawlikowski Department of Computer Science University of Canterbury Christchurch, New Zealand Email: krys@cosc.canterbury.ac.nz I. V. McLoughlin Group Research Tait Electronics Ltd Christchurch, New Zealand Email: ian.mcloughlin@tait.co.nz Abstract— Voice-over-IP is expected to become a popular service offered by the internet. Thus, it is important to ensure high quality of service. In this paper, we look at two standards proposed for evaluating the intelligibility of Chinese speech. Adopting the philosophy and methodology of the Diagnostic Rhyme Test (DRT) for testing English speech, the Chinese Diagnostic Rhyme Test (CDRT) evaluates the six elementary phonemic attributes of Chinese words. Since Chinese is a tonal language, an extension of CDRT called CDRT-Tone evaluates the tonal attributes of Chinese speech. These two tests were used to evaluate the ITU-T G.728 speech coder as a VoIP codec for Chinese speech. Results are compared to the previous evaluations on a GSM 06.10 coder. I. I NTRODUCTION Voice over IP systems use speech codecs to optimise the usage of transmission bandwidth as well as storage. Due to the fact that some speech information is lost in speech coding, the original speech might not be recoverable after transmission. This loss of information might affect both intelligibility and quality of the output speech, where intelligibility means how well one can understand what is being said, and quality means the degree of goodness in the perception of speech. Although these are two different attributes, they are not totally exclusive of each other. Having good quality will mean that intelligibility is of a high standard but this relationship is not reciprocal. Various intelligibility and quality tests were introduced to these two attributes on IP networks or speech codecs. Such tests can be categorised as the subjective and objective tests, where subjective tests involve a group of human listeners to rate either of the two attributes, and objective tests involve some mathematical expressions used to determine speech quality. Some well known subjective intelligibility tests include the Diagnostic Rhyme Test (DRT), Modified Rhyme Test (MRT), and Phonetically Balanced Word Lists (PB) [1]. These are the ones listed as the ANSI standards for speech intelligibility testing. The more popular subjective quality tests are the Diagnostic Acceptability Measure (DAM) and the Mean Opinion Score (MOS) [2]. Various objective quality measures include the Perceptual Speech Quality Measure (PSQM) [3], Perceptual Evaluation of Speech Quality (PESQ) [4], and Deustche Telekom Speech Quality Estimation (DT-SQE) [5]. In this paper, the issue of intelligibility is dealt with, in particular, intelligibility of Chinese Speech. Taking into account that Mandarin Chinese is a language spoken by more than one billion people throughout the world, to provide a better quality of service for the Voice over IP environment, two sets of standards for testing the intelligibility of Chinese speech namely the Chinese Diagnostic Rhyme Test (CDRT) [6] and its extension, CDRT-Tone [7] were proposed. The testing methods of CDRT and CDRT-Tone are being reviewed, and applied to test the ITU-T G.728 speech coder. The results were used to compare to those from previous evaluations on a GSM 06.10 coder [8][7]. A. Chinese Diagnostic Rhyme Test (CDRT) Adopting the philosophy and methodology of the DRT, the CDRT was proposed to evaluate the intelligibility of Chinese speech transmitted through communication systems. It is effectively the DRT applied to Chinese. It uses a corpus of 192 words in 96 rhyming pairs. From this 96 rhyming pairs, six elementary phonemic attributes are tested. They are airflow, nasality, sustention, sibilation, graveness, and compactness. By obtaining results on which attribute fails, a system’s flaw could be easier indentified and therefore corrected. Although the DRT is rather extensive in testing important attributes of English speech, the CDRT does not test all the characteristics of Chinese speech because Chinese, differing from English, is a tonal language. Since CDRT only discriminates consonants, vowels and tones are not tested. Hence one cannot form a concrete conclusion of the intelligibility of Chinese speech in a particular system solely based on CDRT. The corpus of Chinese characters is given in [6]. B. CDRT-Tone In the Chinese language, most syllables/phonemes can be pronounced with one of four different tones [9]. Tone 1 is a high-level tone, tone 2 is a mid-rising tone, tone 3 is a low-falling-rising tone, and tone 4 is a high-falling tone. Figure 1 shows the frequency characteristics of the four tones. Pronouncing a syllable with different tones gives different meanings. For example the chinese syllable “ma” will mean