Computational Intelligence, Volume 18, Number 3, 2002 A METHOD FOR ISOLATED THAI TONE RECOGNITION USING A COMBINATION OF NEURAL NETWORKS NUTTAKORN THUBTHONG Department of Physics and Computer Engineering, Chulalongkorn University, Bangkok, Thailand BOONSERM KIJSIRIKUL Department of Computer Engineering, Chulalongkorn University, Bangkok, Thailand APIRATH PUSITTRAKUL Department of Physics, Chulalongkorn University, Bangkok, Thailand Tone information is very important to speech recognition in a tonal language such as Thai. In this article, we present a method for isolated Thai tone recognition. First, we define three sets of tone features to capture the characteristics of Thai tones and employ a feedforward neural network to classify tones based on these features. Next, we describe several experiments using the proposed features. The experiments are designed to study the effect of initial consonants, vowels, and final consonants on tone recognition. We find that there are some correlations between tones and other phonemes, and the recognition performances are satisfying. A human perception test is then conducted to judge the recognition rate. The recognition rate of a human is much lower than that of a machine. Finally, we explore various combination schemes to enhance the recognition rate. Further improvements are found in most experiments. Key words: Thai tone, tone recognition, combination of neural networks, combination rules, voting techniques. 1. INTRODUCTION During the past decade, speech recognition technology has made significant progress. Several applications of speech recognition to human–computer interface have been developed because speech is the most natural method of human communication and interaction. Most of the existing methods for speech recognition are developed mainly for spoken English, and some of them have been adapted to be applicable to Thai. However, unlike English, Thai is a tonal language. In such a language, the referential meaning of an utterance is dependent on the lexical tones (Jain 1998). Therefore, a tone classifier is an essential component of a speech recognition system of a tonal language. Many methods of tone recognition have been proposed for both isolated and continuous speech in Mandarin and Cantonese. They include the methods based on multilayer perceptrons for four-tone recognition of isolated Mandarin syllables (Chang, Sue, and Chen 1990) and for nine-tone recognition of isolated Cantonese syllables (Lee et al. 1993, 1995), and the methods based on a hidden Markov model (HMM) for four-tone recognition of isolated Mandarin syllables (Yang et al. 1988) and for five-tone recognition of continuous Mandarin speech (Wang and Chen 1994). The fuzzy C-means-based method for four-tone recognition of isolated Mandarin syllables (Li, Xia, and Gu 1999) has also been proposed. In Thai, there are five different lexical tones as follows: the mid /M/, the low /L/, the fall /F/, the high /H/, and the rise /R/. The following examples show the effect of tones on the meaning of an utterance (Luksaneeyanawin 1998): M /kh¯ a:/ (“a kind of grass”), L /kh` a:/ (“galangale”), F /khˆ a:/ (“to kill”), H /kha:/ (“to trade”), and R /khˆ a:/ (“a leg”). The tone Address correspondence to N. Thubthong at Departments of Physics and Computer Engineering, Chulalongkorn University, Bangkok 10330, Thailand. C 2002 Blackwell Publishing, 350 Main Street, Malden, MA 02148, USA, and 108 Cowley Road, Oxford, OX4 IJF, UK.