Automatic Construction of Regression Class Tree for MLLR via Model-based Hierarchical Clustering Shih-Sian Cheng 1, 2 , Yeong-Yuh Xu 1 , Hsin-Min Wang 2 , and Hsin-Chia Fu 1 1 Department of Computer Science, National Chiao-Tung University, Hsinchu, Taiwan {yyxu, hcfu}@csie.nctu.edu.tw 2 Institute of Information Science, Academia Sinica , Taipei, Taiwan {sscheng, whm}@iis.sinica.edu.tw Abstract. In this paper, we propose a model-based hierarchical clustering algorithm that automatically builds a regression class tree for the well-known speaker adaptation technique - Maximum Likelihood Linear Regression (MLLR). When building a regression class tree, the mean vectors of the Gaussian components of the model set of a speaker independent CDHMM- based speech recognition system are collected as the input data for clustering. The proposed algorithm comprises two stages. First, the input data (i.e., all the Gaussian mean vectors of the CDHMMs) is iteratively partitioned by a divisive hierarchical clustering strategy, and the Bayesian Information Criterion (BIC) is applied to determine the number of clusters (i.e., the base classes of the regression class tree). Then, the regression class tree is built by iteratively merging these base clusters using an agglomerative hierarchical clustering strategy, which also uses BIC as the merging criterion. We evaluated the proposed regression class tree construction algorithm on a Mandarin Chinese continuous speech recognition task. Compared to the regression class tree implementation in HTK, the proposed algorithm is more effective in building the regression class tree and can determine the number of regression classes automatically. Keywords: speaker adaptation, MLLR, regression class tree 1 Introduction MLLR [1] is well known for its ability to perform rapid and robust speaker adaptation with a small amount of adaptation data. Extensive research efforts have been made to improve MLLR [8, 13] as well as to develop new methods that extend the conventional MLLR framework [2-7]. In the MLLR proposed by Leggetter and Woodland [1], adaptation of speaker independent (SI) model parameters (e.g., the mean parameters of a CDHMM-based speech recognition system) is carried out via a set of linear transformations, where each regression (transformation) matrix is responsible for the adaptation of one regression class (subset of the model parameters). To enhance flexibility and