Release from energetic masking caused by repeated patterns of glimpsing windows Maury Lander-Portnoy 1 1 Department of Linguistics University of Southern California, USA landerpo@usc.edu Abstract The study of auditory masking not only provides data for how healthy and impaired listeners perform in adverse listening conditions, and thereby approximates their ability to perceive speech in the noisy environments of everyday life, but also provides insights into the mechanisms that underly the detec- tion and perception of speech. Previous studies, (Pollack 1955) (Festen & Plomp 1990) (Cooper et al. 2015), have manipulated noise maskers in an attempt to observe the relationship between modulation of the type or characteristics of masking noise to subjects ability to detect or recognize a target signal. In this experiment, long term average spectrum speech shaped noise maskers were modulated to allow either short or long glimps- ing (Cooke 2005) windows, during which the target signal was unmasked, in one second long morse code patterns of eight win- dows. The results from 60 participants with normal hearing showed that subjects performed significantly better on trials of an open set word recognition task when the pattern of glimpsing windows repeated twice before presentation of the masked sig- nal than a control with the same glimpsing windows during the signal but different beforehand and one with the same amount of noise masking in random patterns before and during the target. Index Terms: speech perception, speech perception in noise, energetic masking 1. Introduction 1.1. Auditory masking Dialogue occurs in a variety of environments throughout every- day life. While some of these auditory scenes provide a blank canvas for linguistic interchange, most contain at least some level of background noise. Masking, the obstructing of the de- tection or comprehension of a target signal, can be either asyn- chronous (the masker precedes or follows the target) or syn- chronous (the target is partially or wholly contained within the masker). While plausibly occurring in this experiment, the con- ditions were counterbalanced to control for any asynchronous masking. Thus, the focus here will be on synchronous maskers, the kind manipulated for the experiment presented in this paper. Synchronous maskers are traditionally divided into two groups depending on the way in which they obstruct the percep- tion of the target signal. The first type, informational masking, concerns the presentation of information similar and in close proximity to the target signal. This masker hinders the process- ing of the correct signal by offering distracting information that appears similar to the target and thus is often confused with the target in the processing of the input. While the target is still per- ceivable, the overlap of target and masker makes it difficult for cognitive processes to tease them apart. Top-down processing plays a crucial role in the release from informational masking. Because of its predication on top-down information, it has been hypothesized to be a more central cognitive process occurring further downstream in the auditory transduction pathway[1][2], and therefore the things that confound its operation are higher level processes such as attention or perceptual grouping[3]. Its counterpart, energetic masking, is conversely conceived of as a peripheral masking phenomenon[4][5]. Energetic masking is thought to hinder perception by ob- scuring the target signal with surrounding noise. While the characteristics of the signal and noise are quite different, as in the case of speech and white noise, too much noise in the input prevents proper signal processing. The difficulty experienced with this type of masker is thought to be due to an overlap of noise and target in the peripheral sensory organs. This means that areas of the system being used to detect the target are also used to detect the noise, and the signal becomes washed out by interference. Because of its hypothesized peripheral nature, we find energetic masking to utilize low level confounds such as exhibiting the same spectral characteristics as speech’s long term average spectrum (LTAS) but with none of the temporal information included in its envelope[6]. The closer the periph- eral activation by the noise is to the activation by the target, the more interference and blocking to sensory resources the masker can provide. A useful line of inquiry when studying masking is by what means we can negate its effects, called “release from masking”. The release from masking is important to study as by learning what defeats masking, we can gain better insight into the process by which it obscures perception. Many mechanisms are studied with regards to release from masking but one of the main ones is auditory stream segregation. 1.2. Auditory scene analysis Auditory scene analysis[7] is the synthesis of temporally and spectrally disparate acoustic information into cohesive auditory percepts (for a review see [8]). These entities are referred to as auditory objects and their formation is key in interacting with the auditory world. The question of what mechanisms are uti- lized in this process has led researchers to examine the criteria that are used in formation and separation of these auditory ob- jects. Two important variables in auditory stream segregation are time and attention, that is, auditory stream segregation is an online process that takes time to occur and is a process that must be attended to [9][10]. It does not occur instantaneously, and if attention is shifted away from an auditory object, stream- ing rapidly resets and the process must start all over again[11]. It is this stream formation and segregation, or something simi- lar, that we hypothesized might provide release from energetic Copyright 2016 ISCA INTERSPEECH 2016 September 8–12, 2016, San Francisco, USA http://dx.doi.org/10.21437/Interspeech.2016-1571 1672