Predicting tonal realizations in one Chinese dialect from another Junru Wu a,b, , Yiya Chen b , Vincent J. van Heuven b,c , Niels O. Schiller b a Dept. Chinese Language and Literature, East China Normal University, 500 Dongchuan Rd., Shanghai 200241, China b Leiden University Centre for Linguistics, Leiden Institute for Brain and Cognition, The Netherlands c Dept. Applied Linguistics, University of Pannonia, Egyetem utca 10, Veszpre ´m, Hungary Received 16 February 2015; received in revised form 20 October 2015; accepted 29 October 2015 Available online 5 November 2015 Abstract Pronunciation dictionaries are usually expensive and time-consuming to prepare for the computational modeling of human languages, especially when the target language is under-resourced. Northern Chinese dialects are often under-resourced but used by a significant number of speakers. They share the basic sound inventories with Standard Chinese (SC). Also, their words usually share the segmental realizations and logographic written forms with the SC translation equivalents. Hence the pronunciation dictionaries of northern Chinese dialects could be easily available if we were able to predict the tonal realizations of the dialect words from the tonal information of their SC counterparts. This paper applies statistical modeling to investigate the tonal aspect of the related words between a northern dialect, i.e. Jinan Mandarin (JM), and Standard Chinese (SC). Multi-linear regression models were built with between-word pitch distance of JM words as the dependent variable and the following were included as the predictors: SC tonal relations, between-dialect tonal identity, and individual backgrounds. The results showed that tonal relations in SC and between-dialect identity, as predictors featuring the relation between the JM and SC tonal systems, are significant and robust predictors of JM tonal realizations. The speakers’ sociolinguistic and cognitive backgrounds, together with the tonal merge and neutral tone information within JM, are important for the prediction of JM tonal realizations and affect the way that between-language predictors take effect. Ó 2015 Elsevier B.V. All rights reserved. Keywords: Tone; Translation equivalents; Cognates; Modeling; Individual backgrounds 1. Introduction 1.1. The necessity and sufficiency of modeling under- resourced northern Chinese dialects Under-resourced languages, featured by the ‘‘lack of a unique writing system or stable orthography, limited pres- ence on the web, lack of linguistic expertise, and lack of electronic resources for speech and language processing (Besacier et al., 2014: 27), have always been a challenge for both engineers of Human Language Technologies (HLT) and linguists. One of the main reasons behind this challenge is the large amount of phonetic data required, which can be both difficult and expensive to acquire. To tackle this challenge, more and more researchers are trans- ferring information from a related language or dialect to improve the understanding and automatic machine- processing of the under-resourced language. For instance, the automatic speech recognition of Afrikaans was signifi- cantly improved using the available Dutch data (Imseng et al., 2014). However, to better incorporate the informa- tion from the related language, we need a better under- standing of the relations between the two languages or dialects. In this aspect, linguists have carried out studies http://dx.doi.org/10.1016/j.specom.2015.10.006 0167-6393/Ó 2015 Elsevier B.V. All rights reserved. Corresponding author at: Dept. Chinese Language and Literature, East China Normal University, 500 Dongchuan Rd., Shanghai 200241, China. Tel.: +86 (0)2154344874. E-mail address: jrwu@zhwx.ecnu.edu.cn (J. Wu). www.elsevier.com/locate/specom Available online at www.sciencedirect.com ScienceDirect Speech Communication 76 (2015) 1–27