Research Article
Language Recognition Using Latent Dynamic Conditional
Random Field Model with Phonological Features
Sirinoot Boonsuk,
1
Atiwong Suchato,
1
Proadpran Punyabukkana,
1
Chai Wutiwiwatchai,
2
and Nattanun Thatphithakkul
2
1
Department of Computer Engineering, Chulalongkorn University, Bangkok 10330, Tailand
2
HLT, National Electronics and Computer Technology Center (NECTEC), Bangkok 10400, Tailand
Correspondence should be addressed to Atiwong Suchato; atiwong.s@chula.ac.th
Received 27 September 2013; Revised 23 December 2013; Accepted 23 December 2013; Published 20 February 2014
Academic Editor: Yue Wu
Copyright © 2014 Sirinoot Boonsuk et al. Tis is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly
cited.
Spoken language recognition (SLR) has been of increasing interest in multilingual speech recognition for identifying the languages
of speech utterances. Most existing SLR approaches apply statistical modeling techniques with acoustic and phonotactic features.
Among the popular approaches, the acoustic approach has become of greater interest than others because it does not require
any prior language-specifc knowledge. Previous research on the acoustic approach has shown less interest in applying linguistic
knowledge; it was only used as supplementary features, while the current state-of-the-art system assumes independency among
features. Tis paper proposes an SLR system based on the latent-dynamic conditional random feld (LDCRF) model using
phonological features (PFs). We use PFs to represent acoustic characteristics and linguistic knowledge. Te LDCRF model was
employed to capture the dynamics of the PFs sequences for language classifcation. Baseline systems were conducted to evaluate the
features and methods including Gaussian mixture model (GMM) based systems using PFs, GMM using cepstral features, and the
CRF model using PFs. Evaluated on the NIST LRE 2007 corpus, the proposed method showed an improvement over the baseline
systems. Additionally, it showed comparable result with the acoustic system based on -vector. Tis research demonstrates that
utilizing PFs can enhance the performance.
1. Introduction
Spoken language recognition (SLR) is the task of determining
the language of a spoken utterance. SLR has become an
important component in many speech processing applica-
tions such as being the preprocessor of multilingual speech
recognition systems and of automatic selection of the appro-
priate language for information service applications. Recent
research works on SLR can be divided into two approaches:
(1) the acoustic approach [1, 2] which directly models the
distributions of acoustic features from speech signals; and
(2) the phonotactic approach [3, 4] which utilizes phone-
sequences tokenized from speech utterances to construct
language modeling of -grams of these phones. An obvious
shortcoming of the phonotactic approach is that manual pho-
netic transcription of speech data is required for constructing
language modeling. Te acoustic approach has become a
popular alternative to overcome this issue due to the fact that
it does not require prior knowledge of a specifc language
and transcription of phonetic data. Furthermore, the acoustic
approach captures the diferences in spectral features between
languages and directly models the distribution of the spectral
features given in the speech utterance in each language. Te
acoustic system based on -vector approach [5] that provided
superior performance has become state-of-the-art in the
language recognition feld.
Te performance of the overall language recognition sys-
tem depends on preprocessing techniques, feature extraction,
and classifcation techniques. Some research studies focused
on feature extraction to improve the performance of SLR
system. A typical acoustic-based SLR system uses the Gaus-
sian mixture model (GMM) [6, 7] to model conventional
Hindawi Publishing Corporation
Mathematical Problems in Engineering
Volume 2014, Article ID 250160, 16 pages
http://dx.doi.org/10.1155/2014/250160