IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 4, JULY 2006 1245 A New Independent Component Analysis for Speech Recognition and Separation Jen-Tzung Chien, Senior Member, IEEE, and Bo-Cheng Chen Abstract—This paper presents a novel nonparametric likeli- hood ratio (NLR) objective function for independent component analysis (ICA). This function is derived through the statistical hypothesis test of independence of random observations. A likeli- hood ratio function is developed to measure the confidence toward independence. We accordingly estimate the demixing matrix by maximizing the likelihood ratio function and apply it to transform data into independent component space. Conventionally, the test of independence was established assuming data distributions being Gaussian, which is improper to realize ICA. To avoid assuming Gaussianity in hypothesis testing, we propose a nonparametric approach where the distributions of random variables are calcu- lated using kernel density functions. A new ICA is then fulfilled through the NLR objective function. Interestingly, we apply the proposed NLR-ICA algorithm for unsupervised learning of unknown pronunciation variations. The clusters of speech hidden Markov models are estimated to characterize multiple pronunciations of subword units for robust speech recognition. Also, the NLR-ICA is applied to separate the linear mixture of speech and audio signals. In the experiments, NLR-ICA achieves better speech recognition performance compared to parametric and nonparametric minimum mutual information ICA. Index Terms—Acoustic modeling, blind source separation (BSS), independent component analysis (ICA), nonparametric likelihood ratio (NLR), pronunciation variation, speech recognition, unsuper- vised learning. I. INTRODUCTION I NDEPENDENT component analysis (ICA) [11] has been attracting the researchers in societies of signal processing and neural networks for many years. It is because that the ICA principle is essential to deal with fundamental issues of blind source separation, blind deconvolution, feature extraction, unsu- pervised learning, data analysis, and compression. Many appli- cations have been developed for text document clustering [21], facial feature representation [40], image enhancement, financial data analysis, and neurobiological signal processing [17]. For the applications on speech processing, ICA was employed for extraction of salient speech features [25], analysis of speaker variations [18], separation of multiple speakers [24], and can- cellation of reverberation in speech signals [4]. ICA was also applied to establish basis functions for individual speakers for speaker recognition [20]. In general, ICA network is a higher- order and nonlinear extension of principal component analysis Manuscript received November 4, 2004; revised May 6, 2005. This work was supported in part by the National Science Council, Taiwan, R.O.C, under Con- tract NSC94-2213-E006-017. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Li Deng. The authors are with the Department of Computer Science and Information Engineering, National Cheng KungUniversity, Tainan, Taiwan 70101, R.O.C. (e-mail: jtchien@mail.ncku.edu.tw). Digital Object Identifier 10.1109/TSA.2005.858061 (PCA), which enables the network to separate statistically inde- pendent components [4], [29]. It has the capability of exploring latent factors in random signals with strong relations to factor analysis (FA) [2], [16], [19]. The main idea of ICA focuses on finding the independent sources from mixed data rather than re- ducing the feature dimension toward maximum expression di- rection via PCA and extracting the Gaussian common factors embedded in unknown data via FA. Basically, ICA is powerful to capture the unknown struc- ture of data by minimizing the statistical dependence of dif- ferent components. It is meaningful to use ICA to solve blind source separation (BSS) problem. BSS aims to estimate a demixing matrix , which separates the mixed signal and recovers its original independent compo- nents . The observed signal was mixed with an unknown matrix , . The key idea of estimating ICA model is to maximize the non-Gaussianity for achieving the independence of sources [17]. Traditionally, the high-order statistics and information-theoretic criteria were exploited to measure the non-Gaussianity or independence. For example, the high-order statistics using absolute value of kurtosis was max- imized to find independent components. But, kurtosis was sen- sitive to outlier data [17]. Also, the measurements using negen- tropy, likelihood function, and mutual information were popular to construct ICA model. The mutual information between the transformed sources was mimimized to find the demixing ma- trix. Such optimization was shown to be equivalent to maximum likelihood and maximum negentropy principles under some as- sumptions [17]. In addition to the BSS problem, ICA was useful for unsupervised learning [30]. In [33], ICA was used to sep- arate a multivariate distribution into a mixture of independent components. Because ICA was inherent in exploring data struc- ture, it was also attractive to derive decision tree-based unsuper- vised classification using ICA [32]. In [26], ICA mixture models were presented for unsupervised classification of non-Gaussian classes. A local ICA model was formed to represent the non- linear data distributions via merging linear ICA and -means clustering [22]. Actually, due to the powerfulness of ICA on solving funda- mental problems, we can apply ICA methods for speech fea- ture extraction as well as acoustic modeling for speech recogni- tion. In [18], the first and the second independent components of speech features were shown containing sexual and accent information, respectively, which were used to improve speech recognition performance. In acoustic modeling for large vo- cabulary continuous speech recognition (LVCSR), we charac- terize context-dependent subword unit using speech data from a pool of speakers with varying genders, accents, ages, emo- tions, etc. The top-down clustering based on decision tree state 1558-7916/$20.00 © 2006 IEEE Authorized licensed use limited to: National Cheng Kung University. Downloaded on October 6, 2009 at 20:39 from IEEE Xplore. Restrictions apply.