CONVOCARE CONSONARE: A DUET FOR FOUR VOICES David Gerhard Ellen Moffat University of Regina Dept. of Computer Science Dept. of Music Regina, SK Canada gerhard@cs.uregina.ca Independent Artist Saskatoon, SK, Canada www.ellenmoffat.ca moffat.e@sasktel.net ABSTRACT The human voice continues to provide source mate- rial for ongoing explorations in musical expression. As one of the oldest musical instruments, the human voice is simultaneously universal in availability, accessibility, un- derstanding and mystery. It provides rich material for the subconscious interpretation of language and the construc- tion of meaning. This paper presents the development of a phonetics-driven audio-visual synthesis engine, a physi- cal interface to the synthesis engine, and an original sound composition, convocare consonare, which fuses the syn- thesis engine and the physical interface as an artistic in- vestigation of polyphonic composition. As background, the phonetic components of voice are explored, considering historical and recent uses of phonet- ics as source material for musical expression. Technical, acoustic and artistic parameters of voice are discussed, in- cluding artiﬁcial generation by computer systems and the perception by humans. A set of phoneme classiﬁcation taxonomies are developed which suggest possibilities for future exploration of voice as musical source. convocare consonare is the latest in an ongoing col- lection of works exploring phoneme-level voice in art. The development experiences are presented as a case study in the use of low-level linguistic material as source for mu- sic, interactive art, and expression. 1. INTRODUCTION The human voice is an expressive and eloquent musical in- strument. Song and other human-instrument music is per- vasive in human history and is one of the primary forms of modern entertainment. Much of the “content” of current popular music is contained in the singer’s lyrics, expres- sion and style. Modern electronic music is often seen to move away from the concept of voice and lyric to more ab- stract soundscape environments. While the two seem dis- parate, the notion of the voice can be included in modern electronic music as source material, with several obvious advantages. Many modern new-music artists have made signiﬁcant use of phonemes, language, and voice in their composi- tions [1, 9], and the use of individual components of lan- guage as notes or grains of music is not novel [14], but there remains opportunities for exploration. This work de- scribes our experiences with the use of voice in a number of speciﬁc contexts. The human voice has numerous advantages over other source material: • Voice is recognizable in its original form and has perceptual and psychological immediacy, yet it also permits easy manipulation to create sounds which are familiar and yet not familiar. • Voice is spectrally rich, allowing the creation of mu- sically interesting content on a note-by-note basis. • Voice is widely and freely available, not requiring algorithmic or copyrighted acquisition. • The acoustic properties of voice are interpreted at a subconscious level but can be brought to conscious awareness with little effort or attention. This con- scious/subconscious awareness can be used to the musician’s advantage. • The computational, spectral, and physical charac- teristics of voice have been studied in other disci- plines (notably speech recognition) for many years, yet the application of these studies to artistic expres- sion continues to hold novelty. 1.1. The phoneme misnomer Phonemes in language do not exist in isolation; rather, each is shaped by the previous and subsequent phonemes in the speech sequence. Indeed, words themselves are not separated as such in the acoustic stream, but are connected one to the next, and even overlap when the ﬁnal phoneme of one word is identical to or related to the initial phoneme of the next. Given that phonemes do not exist in isolation, how might one effectively take advantage of preexisting cognitive models of the components of language? Sec- tion 3 details the experimentation that led to our particular implementations that address these concerns. Another misnomer of phonetic deconstruction is the categorical perception and production of phonemes. The 477