ICPhS XVII Regular Session Hong Kong, 17-21 August 2011 1970 SUBJECTIVE INTELLIGIBILITY TESTING AND PERCEPTUAL STUDY OF THAI INITIAL AND FINAL CONSONANTS C. Tantibundhit a , C. Onsuwan b , S. Thatphithakkul c , P. Chootrakool c , K. Kosawat c , N. Thatphithakkul c , T. Saimai a & N. Saimai a a Department of Electrical and Computer Engineering, Thammasat University, Thailand; b Department of Linguistics, Thammasat University, Thailand; c National Electronics and Computer Technology Center (NECTEC), Thailand tchartur@engr.tu.ac.th; consuwan@tu.ac.th ABSTRACT We methodically design and develop a subjective intelligibility testing of Thai speech based on the diagnostic rhyme test (DRT). The Thai DRT (TDRT) consists of 2 test sets, one for initials and the other final consonants. The test for initials is designed to equally compare 21 phonemes pairwise, which results in 210 stimulus pairs. The TDRT for finals compares 8 final phonemes, yielding 84 stimulus pairs. These tests are well- constructed using real words. TDRT have two main advantages. It allows us to evaluate percent intelligibility responses in each stimulus pair and to systematically compare confusion responses across all phonemes. To test the validity of our method and to further our investigation, we carry out the subjective intelligibility test on twenty eight Thai listeners using TDRT, which varies in 4 SNR levels (6, 12, 18, and 24dB). Average intelligibility scores and confusion matrices for initial and final consonants are analyzed. Keywords: Thai, diagnostic rhyme test, subjective intelligibility, initial/final consonants, confusion matrix 1. INTRODUCTION Speech intelligibility and speech quality are two distinct properties. Speech quality reflects how an utterance is produced and also includes speech attributes such as natural, raspy, hoarse, etc. Speech intelligibility, on the other hand, refers to what is being said, i.e., the meaning or the content of the spoken words [5]. Therefore, speech intelligibility is one of the essential attributes of the speech signal and needs to be preserved by speech enhancement algorithms [5]. Several algorithms have been developed specifically to enhance speech intelligibility in background noise [5]. Evaluating intelligibility of the enhanced compared with the original speech is often conducted using a subjective intelligibility testing [5]. Several intelligibility tests have been proposed for English by using rhyming words presented in six-response [2] or in pair-response [8]. House et al. developed a test by restricting response choices to a finite set of six rhyming words called the modified rhyme test (MRT) [2]. The test was composed of 50 sets, each of which was composed of six monosyllabic consonant vowel-consonant (CVC) words. Twenty-five sets differed in their initial consonants, while the rest differed in their final consonants. Voiers refined the MRT and created a diagnostic rhyme test (DRT) [8], which is widely used for a subjective testing for measuring the intelligibility of speech coders [5]. The DRT was an A/B forced comparison test based on word pairs differing in their initial consonants by one of six distinctive features [8]. The DRT test material was composed of a word list of 96 rhyming pairs, e.g., veal - feel. As the DRT was developed specifically for English, it has some limitations when evaluating intelligibility of a tonal language such as Chinese [6]. McLoughlin devloped a New Chinese diagnostic rhyme test (NCDRT) [6]. The NCDRT was composed of a test set of phonemes in Chinese, which were classified under six distinctive features similar to the DRT [6]. Although the subjective intelligibility testing of a tonal language such as Chinese is well underway [6], subjective intelligibility testing of another tonal language, Thai, with several acoustic and phonemic differences from that of Chinese has yet to be developed. Therefore, this paper proposes an intelligibility testing of Thai speech specifically for its initial and final consonants. The tests are designed to facilitate an evaluation of percent intelligibility responses in each stimulus pair and to systematically compare confusion responses across all initial and final phonemes.