Q. Huo et al. (Eds.): ISCSLP 2006, LNAI 4274, pp. 475 – 484, 2006.
© Springer-Verlag Berlin Heidelberg 2006
Language Identification by Using Syllable-Based
Duration Classification on Code-Switching Speech
Dau-cheng Lyu
2,3
, Ren-yuan Lyu
1
, Yuang-chin Chiang
4
, and Chun-nan Hsu
3
1
Dept. of Computer Science and Information Engineering, Chang Gung University
2
Dept. of Electrical Engineering, Chang Gung University
3
Institute of Information Science, Academia Sinica
4
Institute of statistics, National Tsing Hua University
renyuan.lyu@gmail.com
Abstract. Many approaches to automatic spoken language identification (LID)
on monolingual speech are successfully, but LID on the code-switching speech
identifying at least 2 languages from one acoustic utterance challenges these
approaches. In [6], we have successfully used one-pass approach to recognize
the Chinese character on the Mandarin-Taiwanese code-switching speech. In
this paper, we introduce a classification method (named syllable-based duration
classification) based on three clues: recognized common tonal syllable tonal
syllable, the corresponding duration and speech signal to identify specific
language from code-switching speech. Experimental results show that the
performance of the proposed LID approach on code-switching speech exhibits
closely to that of parallel tonal syllable recognition LID system on monolingual
speech.
Keywords: language identification, code-switching speech.
1 Introduction
Code-switching is defined as the use of more than one language, variety, or style by a
speaker within an utterance or discourse. It is a common phenomenon in many
bilingual societies. In Taiwan, at least two languages (or dialects, as some linguists
prefer to call them) - Mandarin and Taiwanese- are frequently mixed and spoken in
daily conversations.
For the monolingual LID system development, the parallel syllable recognition
(PSR) was adopted, which is similar to the method of parallel phone recognition
(PPR), and this approach is widely used in the automatic LID researches. [1,-5] Here,
the reason to use syllable as the recognized result instead of phone is because both
Taiwanese and Mandarin are syllabic languages. Another approach, which is called
parallel phone recognition followed by language modeling (parallel PRLM), used
language-dependent acoustic phone models to convert speech utterances into
sequences of phone symbols with language decoding followed. After that, these
acoustic and language scores are combined into language-specific scores for making
an LID decision. Compared with parallel PRLM, PSR uses integrated acoustic models