A New Approach for Automatic Tone Error Detection in Strong Accented Mandarin Based on Dominant Set Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China {taotao,dfke, zbchen, xubo}@hitic.ia.ac.cn Abstract In this paper, we proposed a new approach based on dominant set [1] for tone error detection in strong accented Mandarin. First, the ﬁnal boundary generated from forced alignment is regulated by the F0 contour in order to locate the ﬁnal domain more accurately. After that, proper normalization techniques are explored for the tone features. Finally, clustering and classi- ﬁcation methods based on dominant set are utilized for the tone error detection. The proposed approach is tested in compari- son with the traditional k-means based method, experimental re- sults show that it achieves more satisfying performance with an average Cross-Correlation 0.84 between human and machine, reaches to that between humans, which have veriﬁed the effec- tiveness of the proposed approach. The main advantage of this approach lies in not only the error pronunciation of tone can be well identiﬁed, but also the F0 pattern of the tone error can be informatively provided as the feedback. Index Terms: CALL (Computer Assisted Language Learning), tone error detection, dominant set, forced alignment, F0 1. Introduction As is known to all, Mandarin is tonal language. The widely used four tones in Mandarin are high, rising, low, falling tones, also denoted by Tones 1 to 4. Tone plays a signiﬁcant role in the live communication because many words are differentiated solely by tone. However, it is more difﬁcult to be pronounced correctly in comparison with initial and ﬁnal due to the dialect of a speaker. Most people in China are using their native di- alect and Mandarin, their pronunciations are always depending on how well they grasp the language. Therefore, detecting tone errors is an important component in Mandarin CALL systems [2], which aim to help the language learners correct and im- prove their pronunciations in the whole learning process. In this paper, a new tone error detection approach based on clustering is proposed. The main idea can be described as: ﬁrstly, towards the pronunciations of each tone, the correspond- ing positive tone clusters and negative tone clusters are obtained via clustering, then the testing tones are assigned to their most correlated cluster using the similarity measure (or distance mea- sure). In general, a straightforward approach to detect tone er- ror in Mandarin is by means of tone recognition, however, in the real circumstances, the pitch variation of tone is not always according to the four canonical Mandarin tones ascribe to the di- alectal accent within the pronunciation, this tonal modiﬁcation leads to the recognition rate of tone is relatively low [3]. Moti- vated by the desire to determine the pitch variations in different tones more accurately, therefore, the idea based on clustering is This work was supported by a grant from the National Natural Sci- ence Foundation of China No. 90820303 presented for tone error detection. In this work, the traditional model-based approaches such as HMM or GMM are not chosen to train the tone clusters, because these methods require a large number of samples to grasp the feature distribution, and yet the samples among dif- ferent tone categories in the “tone error space” are not well- proportioned, some categories of the tone errors are incomplete and not easy to be collected. In this task, we use a new dominant set based approach for tone error detection due to its efﬁciency and directness in clustering and classiﬁcation. Dominant set is proposed by Pavan et al. [1] [4], and its corresponding clustering technique — dominant set clustering (DSC) has been widely used in the ﬁelds of image segmentation and video processing due to its intuitiveness, inherent hierarchi- cal nature and superiority in clustering accuracy and stability, different from traditional clustering algorithms, it can automat- ically determine the number of clusters with low computational cost. Furthermore, dominant set can be used for classiﬁcation [5]. Therefore, We introduce its application to the task of tone error detection. The outline of this paper is organized as follows: Related works are discussed in Section 2; Section 3 formulates the dom- inant set clustering and classiﬁcation algorithms; In Section 4, we describe our tone error detection approach; Experimental results and analysis are given in Section 5; Followed by conclu- sion and future work in Section 6. 2. Related works In the last decade, great achievements have been made in CALL systems. Following the GOP (goodness of pronunciation) score used by Witt [2], lots of studies have been investigated, and the majority of them are based on posterior probability [6] and pronunciation rules [7] derive from the state-of-the-art speech recognition. In Franco et al. [6], posterior-based methods with native models are preferred in detection tasks. Ito et al. [7] introduce decision tree based error clustering with multi- thresholds. These works mainly focus on the segmental pronun- ciation error. By contrast, tone has drawn much less attention in the literature of CALL. Pan et al. [3] use the posterior probabil- ities generated by GMM for tone assessment on strong accented Mandarin speech. Zhang et al. [8] use log-posterior probabili- ties as the GOP score for tone mispronunciation detection under an MSD-HMM framework. Wei et al. [9] utlize HMM to detect tone errors, and the F0 after a CDF-matching normalization is used as the feature for tone model. The aforementioned meth- ods for the task of tone error detection are based on posterior probabilities. Despite the fact these methods can perform well, both of them are heavily threshold-dependent. Comparing to the previous work, our approach avoids this problem to reach a high average CC between human and machine. Copyright  2010 ISCA 26 - 30 September 2010, Makuhari, Chiba, Japan INTERSPEECH 2010 777 10.21437/Interspeech.2010-283