Q. Huo et al. (Eds.): ISCSLP 2006, LNAI 4274, pp. 475 – 484, 2006. © Springer-Verlag Berlin Heidelberg 2006 Language Identification by Using Syllable-Based Duration Classification on Code-Switching Speech Dau-cheng Lyu 2,3 , Ren-yuan Lyu 1 , Yuang-chin Chiang 4 , and Chun-nan Hsu 3 1 Dept. of Computer Science and Information Engineering, Chang Gung University 2 Dept. of Electrical Engineering, Chang Gung University 3 Institute of Information Science, Academia Sinica 4 Institute of statistics, National Tsing Hua University renyuan.lyu@gmail.com Abstract. Many approaches to automatic spoken language identification (LID) on monolingual speech are successfully, but LID on the code-switching speech identifying at least 2 languages from one acoustic utterance challenges these approaches. In [6], we have successfully used one-pass approach to recognize the Chinese character on the Mandarin-Taiwanese code-switching speech. In this paper, we introduce a classification method (named syllable-based duration classification) based on three clues: recognized common tonal syllable tonal syllable, the corresponding duration and speech signal to identify specific language from code-switching speech. Experimental results show that the performance of the proposed LID approach on code-switching speech exhibits closely to that of parallel tonal syllable recognition LID system on monolingual speech. Keywords: language identification, code-switching speech. 1 Introduction Code-switching is defined as the use of more than one language, variety, or style by a speaker within an utterance or discourse. It is a common phenomenon in many bilingual societies. In Taiwan, at least two languages (or dialects, as some linguists prefer to call them) - Mandarin and Taiwanese- are frequently mixed and spoken in daily conversations. For the monolingual LID system development, the parallel syllable recognition (PSR) was adopted, which is similar to the method of parallel phone recognition (PPR), and this approach is widely used in the automatic LID researches. [1,-5] Here, the reason to use syllable as the recognized result instead of phone is because both Taiwanese and Mandarin are syllabic languages. Another approach, which is called parallel phone recognition followed by language modeling (parallel PRLM), used language-dependent acoustic phone models to convert speech utterances into sequences of phone symbols with language decoding followed. After that, these acoustic and language scores are combined into language-specific scores for making an LID decision. Compared with parallel PRLM, PSR uses integrated acoustic models