Paralinguistic Microphone Alex McLean Interdisciplinary Centre for Scientiﬁc Research in Music University of Leeds, UK a.mclean@leeds.ac.uk EunJoo Shin Incheon Catholic University Yeonsu-gu 406-849 Incheon, Korea eunjooshin@gmail.com Kia C. Ng Interdisciplinary Centre for Scientiﬁc Research in Music University of Leeds, UK k.c.ng@leeds.ac.uk ABSTRACT The Human vocal tract is considered for its sonorous qual- ities in carrying prosodic information, which implicates vi- sion in the perceptual processes of speech. These considera- tions are put in the context of previous work in NIME, form- ing background for the introduction of two sound installa- tions;“Microphone”, which uses a camera and computer vi- sion to translate mouth shapes to sounds, and “Microphone II”, a work-in-progress, which adds physical modelling syn- thesis as a sound source, and visualisation of mouth move- ments. Keywords face tracking, computer vision, installation, microphone 1. INTRODUCTION The human voice is a highly adapted carrier of language, but in the digital age its articulation of paralinguistic qual- ities is often not considered. This is because much of the expressive range and subtlety of the voice lies outside what is commonly notated in the typewritten word. Through in- teractive sound installation, we have developed an approach which focuses on the sonorous qualities of the voice as car- rying paralinguistic communication. In the following we consider our work against a diverse background, including psychology of perception, and build a theoretical basis for wider consideration of related works in New Interfaces for Musical Expression and related ﬁelds. 2. SOUND AND SHAPE In the human vocal tract, the relationship between sound, shape and articulation is clear. This relationship is visceral, and ﬁrmly grounded in perception; watching lips move can create a very real experience of hearing sounds which are not there (e.g. McGurk-McDonald eﬀect, McGurk and Mac- Donald 1976). This has been shown to generalise to watch- ing abstract movements (Rosenblum and Salda˜ na 1996), demonstrating shared resources for movement and sound in our perceptual faculties. The relationship between sound and shape is a recurring subject of interest by artists and musicians. One example in digital art includes Takeluma, an alphabet which is based Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on the ﬁrst page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior speciﬁc permission and/or a fee. NIME’13, May 27 – 30, 2013, KAIST, Daejeon, Korea. Copyright remains with the author(s). on mouth shape (Cho 2005), with reference to similar prop- erties of Hangul, the native, yet invented alphabet of the Korean language. What these systems have in common is that they notate sound with shape. As already noted, there is strong indication that perception of speech is informed by visual perception of shape, and kinaesthetic perception of its articulation, complementing sonic perception via the cochlear. This relates to the use of vocable words in mu- sic, where musicians use words to describe instrumental ar- ticulations, connecting their voice to their instrument in a process which often amounts to onomatopoeia (Chambers 1980; McLean and Wiggins 2008). As Neumark (2010) describes, the voice is both sonorous and signifying, and both embodied and between bodies. These are apparent paradoxes, but our present work brings attention to sonorous qualities as meaningful in their own right, in as much as abstract, orientational metaphor is considered meaningful. Orientational metaphors are those which express concepts in terms of each other, via spatial relationships with the body, forming a coherent system of meaning (Lakoﬀ and Johnson 1980; G¨ ardenfors 2000). The paradoxical ground between the voice as both embodied and shared between bodies is where our work sits, and for us is a question of resonance, analogous to the two hemispheres of the brain making a whole, through mutual oscillation (Buzsaki 2006). 3. PROSODY IN NEW INTERFACES FOR MUSICAL EXPRESSION The connection between mouth shape and sound is a re- curring theme in the NIME proceedings, for example the Mouthesizer was demonstrated in the ﬁrst NIME workshop in 2001 (Lyons and Tetsutani 2001), controlling ﬁlters based on analysis of mouth width and height via computer vision. Further developments have included the control of physi- cal models (de Silva, Smyth, and Lyons 2004) and whole- face tracking, including mouth shape, in musical parameter mapping (Ng 2004). Voice-controlled synthesis has been pi- oneered by Janer and Pe˜ nalba (2007), using vocable words based on scat singing in Jazz as a control mechanism. In somewhat related work, McLean and Wiggins (2008) has explored the use of vocable words by describing sounds with onomatopoeic text. In these latter two examples, phonetics are implicated for their role in describing movements which both underlie the production of sound, and inform its per- ception. 4. MICROPHONE Microphone is an artwork by Communications, a collabora- tion between two of the present authors. Microphone was ﬁrst installed at the Unleashed Devices group show at the Watermans gallery London in Autumn 2010, inviting par- 