Multi-level processing of phonetic variants in speech production and visual word processing: evidence from Mandarin lexical tones Jessie S. Nixon a,b *, Yiya Chen a,b and Niels O. Schiller a,b a Leiden Institute for Brain and Cognition (LIBC), Leiden University, 2300 RB Leiden, The Netherlands; b Leiden University Centre for Linguistics (LUCL), Leiden University, 2300 RB Leiden, The Netherlands (Received 9 August 2013; accepted 19 June 2014) Two pictureword interference experiments provide new evidence on the nature of phonological processing in speech production and visual word processing. In both experiments, responses were significantly faster either when distractor and target matched in tone category, but had different overt realisations (toneme condition) or when target and distractor matched in overt realisation, but mismatched in tone category (contour condition). Tone 3 sandhi is an allophone of Beijing Mandarin Tone 3 (T3). Its contour is similar to another tone, Tone 2. In Experiment 1, sandhi picture naming was faster with contour (Tone 2) and toneme (low Tone 3) distractors, compared to control distractors. This indicates both category and context-specific representations are activated in sandhi word production. In Experiment 2, both contour (Tone 2) and toneme (low Tone 3) picture naming was facilitated by visually presented sandhi distractors, compared to controls, evidence that category and context-specific instantiated representations are automatically activated during processing of visually presented words. Combined, the results point to multi-level processing of phonology, whether words are overtly produced or processed visually. Interestingly, there were differences in the time course of effects. Keywords: speech production; Mandarin Chinese; lexical tone; phonetic variation; sub-phonemic detail; phonological processing; picture-word interference How are the sounds of language stored in memory and accessed during language production? Early accounts assumed phonology to be processed in terms of (optim- ally) functional units that distinguish between lexical items: phonemes. Phonemes were conceptualised as abstract, idealised representations of sound (Foss & Swinney, 1973; Meyer, 1990, 1991; Roelofs, 1999). In most experiments investigating phonology, phonological relatedness is measured in terms of phoneme overlap. In addition, some of the most influential models of language production (Dell, 1986, 1988; Indefrey & Levelt, 2004; Levelt, 2001; Levelt, Roelofs, & Meyer, 1999) posit lexical access to involve activation of sequences of phonemes. Phonemes (e.g. /t/ or /k/) are the smallest units of sound that distinguish between words in a particular language (e.g. topvs. copin English). In contrast, allophones vary with phonetic context, but do not affect word meaning. For example, word-initially, English /t/ is aspirated (has a puff of air, e.g. top), but is unaspirated (no puff of air) following /s/ (e.g. stop). Experimental evidence suggests that phoneme-like generalisation plays a role in online speech processing. For instance, in a perceptual learning experiment, McQueen, Cutler, and Norris (2006) had Dutch participants perform a training phase of auditory lexical decisions to words in which either the final /f/ or the final /s/ was replaced by an ambiguous (f-s) fricative sound. These words created a lexical bias to interpret the ambiguous sound as a particular phoneme. For example, participants in the ambiguous /f/ condition heard (witlɔ?), where witlof is a real Dutch word, but witlos is not, thereby creating a bias to interpret the ambiguous sound as an /f/. In the following test phase, participants made lexical decisions to visually presented minimal pair words (e.g. doof deaf; doos box) preceded by auditory primes containing the ambiguous sound (e.g. doo?). Facilitation depended on which ambiguous phoneme participants were trained with. Participants who heard the ambiguous sound in /f/-words during training were faster to identify visually presented /f/-words (e.g. doof), whereas participants who heard ambiguous /s/ were faster to name /s/-words (e.g. doos). Participants had adjusted (re-tuned) their perceptual categories by matching the distorted sound to lexical items stored in memory. Importantly, since different sets of words were used in training and test, re-tuning was not restricted to specific words, but instead must have generalised to elements common to both training and test words; that is, to phoneme categories. Similarly, McLennan, Luce, and Charles-Luce (2003) found evidence for category-level processing in produc- tion. In American English, word-medial /d/ and /t/ are *Corresponding author. Email: jess.s.nixon@gmail.com Language, Cognition and Neuroscience, 2014 http://dx.doi.org/10.1080/23273798.2014.942326 © 2014 Taylor & Francis