IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006 1245
A New Independent Component Analysis for
Speech Recognition and Separation
Jen-Tzung Chien, Senior Member, IEEE, and Bo-Cheng Chen
Abstract—This paper presents a novel nonparametric likeli-
hood ratio (NLR) objective function for independent component
analysis (ICA). This function is derived through the statistical
hypothesis test of independence of random observations. A likeli-
hood ratio function is developed to measure the confidence toward
independence. We accordingly estimate the demixing matrix by
maximizing the likelihood ratio function and apply it to transform
data into independent component space. Conventionally, the test of
independence was established assuming data distributions being
Gaussian, which is improper to realize ICA. To avoid assuming
Gaussianity in hypothesis testing, we propose a nonparametric
approach where the distributions of random variables are calcu-
lated using kernel density functions. A new ICA is then fulfilled
through the NLR objective function. Interestingly, we apply
the proposed NLR-ICA algorithm for unsupervised learning
of unknown pronunciation variations. The clusters of speech
hidden Markov models are estimated to characterize multiple
pronunciations of subword units for robust speech recognition.
Also, the NLR-ICA is applied to separate the linear mixture of
speech and audio signals. In the experiments, NLR-ICA achieves
better speech recognition performance compared to parametric
and nonparametric minimum mutual information ICA.
Index Terms—Acoustic modeling, blind source separation (BSS),
independent component analysis (ICA), nonparametric likelihood
ratio (NLR), pronunciation variation, speech recognition, unsuper-
vised learning.
I. INTRODUCTION
I
NDEPENDENT component analysis (ICA) [11] has been
attracting the researchers in societies of signal processing
and neural networks for many years. It is because that the ICA
principle is essential to deal with fundamental issues of blind
source separation, blind deconvolution, feature extraction, unsu-
pervised learning, data analysis, and compression. Many appli-
cations have been developed for text document clustering [21],
facial feature representation [40], image enhancement, financial
data analysis, and neurobiological signal processing [17]. For
the applications on speech processing, ICA was employed for
extraction of salient speech features [25], analysis of speaker
variations [18], separation of multiple speakers [24], and can-
cellation of reverberation in speech signals [4]. ICA was also
applied to establish basis functions for individual speakers for
speaker recognition [20]. In general, ICA network is a higher-
order and nonlinear extension of principal component analysis
Manuscript received November 4, 2004; revised May 6, 2005. This work was
supported in part by the National Science Council, Taiwan, R.O.C, under Con-
tract NSC94-2213-E006-017. The associate editor coordinating the review of
this manuscript and approving it for publication was Dr. Li Deng.
The authors are with the Department of Computer Science and Information
Engineering, National Cheng KungUniversity, Tainan, Taiwan 70101, R.O.C.
(e-mail: jtchien@mail.ncku.edu.tw).
Digital Object Identifier 10.1109/TSA.2005.858061
(PCA), which enables the network to separate statistically inde-
pendent components [4], [29]. It has the capability of exploring
latent factors in random signals with strong relations to factor
analysis (FA) [2], [16], [19]. The main idea of ICA focuses on
finding the independent sources from mixed data rather than re-
ducing the feature dimension toward maximum expression di-
rection via PCA and extracting the Gaussian common factors
embedded in unknown data via FA.
Basically, ICA is powerful to capture the unknown struc-
ture of data by minimizing the statistical dependence of dif-
ferent components. It is meaningful to use ICA to solve blind
source separation (BSS) problem. BSS aims to estimate a
demixing matrix , which separates the mixed signal
and recovers its original independent compo-
nents . The observed signal was mixed
with an unknown matrix , . The key idea of estimating
ICA model is to maximize the non-Gaussianity for achieving
the independence of sources [17]. Traditionally, the high-order
statistics and information-theoretic criteria were exploited to
measure the non-Gaussianity or independence. For example, the
high-order statistics using absolute value of kurtosis was max-
imized to find independent components. But, kurtosis was sen-
sitive to outlier data [17]. Also, the measurements using negen-
tropy, likelihood function, and mutual information were popular
to construct ICA model. The mutual information between the
transformed sources was mimimized to find the demixing ma-
trix. Such optimization was shown to be equivalent to maximum
likelihood and maximum negentropy principles under some as-
sumptions [17]. In addition to the BSS problem, ICA was useful
for unsupervised learning [30]. In [33], ICA was used to sep-
arate a multivariate distribution into a mixture of independent
components. Because ICA was inherent in exploring data struc-
ture, it was also attractive to derive decision tree-based unsuper-
vised classification using ICA [32]. In [26], ICA mixture models
were presented for unsupervised classification of non-Gaussian
classes. A local ICA model was formed to represent the non-
linear data distributions via merging linear ICA and -means
clustering [22].
Actually, due to the powerfulness of ICA on solving funda-
mental problems, we can apply ICA methods for speech fea-
ture extraction as well as acoustic modeling for speech recogni-
tion. In [18], the first and the second independent components
of speech features were shown containing sexual and accent
information, respectively, which were used to improve speech
recognition performance. In acoustic modeling for large vo-
cabulary continuous speech recognition (LVCSR), we charac-
terize context-dependent subword unit using speech data from
a pool of speakers with varying genders, accents, ages, emo-
tions, etc. The top-down clustering based on decision tree state
1558-7916/$20.00 © 2006 IEEE
Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 6, 2009 at 20:39 from IEEE Xplore. Restrictions apply.