DOES AUDITORY MASKING EXPLAIN THE HIGH VOICE SUPERIORITY? Song Hui Chon, David Huron School of Music Ohio State University ABSTRACT The majority of music in the world employs multiple concurrent parts. Among these parts, the upper-most part or voice often carries the melody. Considering the upward spread of auditory masking patterns, one might think the high-voice melodies are conflicting with masking theory. In this paper, we investigate the mutual masking effects of concurrent high- and low-pitched complex tones. In addition, we consider four types of spectral envelope patterns and discuss their influence on auditory masking. 1. INTRODUCTION One of the most characteristic differences between music and speech is the typical number of concurrent sound sources involved. Speech usually involves social turn-taking, with a single speech stream alternating between the conversants. By contrast, although there is significant music-making involving a single stream (what musicians call “monophony”), the majority of the world's music-making involves multiple concurrent sound sources, and multiple concurrent auditory streams. Among these concurrent parts, the melody, which is in general more important than the rest, is usually placed on the high voice or part. This high-voice melody practice is consistent with the fact that the changes are most easily detected in the highest stream (Zenatti, 1969; Palmer & Holleran, 1994; Crawley et al., 2002). This is known as the “High Voice Superiority.” The high voice superiority has been examined quite extensively by Trainor and her team (Fujioka et al., 2005, 2008; Marie & Trainor, 2013) who reported that it might “result from the neurophysiological characteristics of the peripheral auditory system.” Auditory masking (ANSI, 1960) is defined as “1. The process by which the threshold of audibility for one sound is raised by the presence of another (masking) sound,” and “2. The amount by which the threshold of audibility for one sound is raised by the presence of another (masking) sound. The unit customarily used is the decibel.” In other words, masking means that when two sounds are present at the same time, one sound may not be heard as well as it would have been in isolation, due to the existence of the other sound. There are two types of auditory masking, temporal masking and frequency masking, according to the domain of the proximity of two sounds. In this paper, we will only consider the frequency masking. Frequency masking has been studied well in the context of auditory filter and critical bandwidth (Fletcher, 1940; Greenwood, 1961; Plomp & Levelt, 1965; Scharf 1970; Patterson 1976; Moore & Glasberg, 1983; Zwicker & Fastl, 1990). Critical band usually refers to a frequency range, within which two pure tones will interact and not resolve perfectly. The masking pattern, known as the spreading function or pattern, has been obtained by examining the degree of masking effect according to the placement of two pure tones along the length of a critical band. The masking pattern is not symmetric around the center of the critical band. Rather, it has a longer tail towards the higher frequency, which is known to be the “upward spread of masking.” This upward spread of masking means that the masking effect of a lower tone on a higher tone within the same critical bandwidth is greater than vice versa. The fact that a lower tone masks a higher tone more effectively seems to be contradictory with the High Voice Superiority, the fact that most melodies are on the high voice. Or do they really conflict with each other? To answer this question, we implemented a computer simulation in MATLAB to examine pairs of complex tones and to determine which one masks the other better. We considered four different amplitude patterns to test their impact on masking effectiveness. Our hypothesis is that on average, higher-pitched harmonic complex tones will tend to mask the partials of comparable lower-pitched tones more than vice versa. 2. COMPUTATIONAL MODEL A simulation, based on the psychoacoustic model in Bosi and Goldberg (2003), was implemented in order to compare the mutual masking between pairs of complex tones differing in pitch. The amount of masking will obviously depend on many factors, including the spectral envelope of the participating tones. Hence, we considered four types of spectral envelopes (“uniform,” “increasing,” “grand average” and “per-pitch average”; the last two are from Plazak et al., 2010). For each of the four envelopes, we examined their masking impacts on 81 pitches from B0 to G7. Stimuli were generated in MATLAB using sinusoids at the fundamental frequency plus 323 overtones for the first three envelope types. The sound files from Plazak et al. (2010) were used for the “per-pitch average” type. In our simulation, we paired all possible non-unison tones spanning the range B0 to C7. That is, we began by generating complex tones for a pair of B0 with C1, each tone exhibiting the particular amplitude envelope pattern for each study. The complex tones were then used as input to a masking model. The masking model begins by carrying out a Fast Fourier Transform (FFT). The frequency and amplitude resolution of the Fourier analysis depends to some degree on whether the harmonics are integrally related to the FFT size. The FFT size we used was set to a resolution of 1 Hz, so in order to maintain good resolution we rounded the synthesized input frequencies so that they would be at