Auditory speech detection in noise enhanced by lipreading q Lynne E. Bernstein a,b, * , Edward T. Auer Jr. a , Sumiko Takayanagi a a Department of Communication Neuroscience, House Ear Institute, 2100 West Third Street, Los Angeles, CA 90057, USA b National Science Foundation, Arlington, VA 22230, USA Received 1 March 2004; received in revised form 8 October 2004; accepted 13 October 2004 Abstract Audiovisual speech stimuli have been shown to produce a variety of perceptual phenomena. Enhanced detectability of acoustic speech in noise, when the talker can also be seen, is one of those phenomena. This study investigated whether this enhancement eﬀect is speciﬁc to visual speech stimuli or can rely on more generic non-speech visual stimulus properties. Speech detection thresholds for an auditory /ba/ stimulus were obtained in a white noise masker. The auditory /ba/ was presented adaptively to obtain its 79.4% detection threshold under ﬁve conditions. In Experiment 1, the syllable was pre- sented (1) auditory-only (AO) and (2) as audiovisual speech (AVS), using the original video recording. Three types of synthetic visual stimuli were also paired synchronously with the audio token: (3) A dynamic Lissajous (AVL) ﬁgure whose vertical extent was correlated with the acoustic speech envelope; (4) a dynamic rectangle (AVR) whose horizontal extent was correlated with the speech envelope; and (5) a static rectangle (AVSR) whose onset and oﬀset were synchronous with the acoustic speech onset and oﬀset. Ten adults with normal hearing and vision participated. The results, in terms of dB signal-to-noise ratio (SNR), were AVS < (AVL  AVR  ASR) < AO. That is, AVS was signiﬁcantly easiest to detect, there was no diﬀerence among the synthesized visual stimuli, and all audiovisual conditions resulted in signiﬁcantly lower thresholds than AO. To determine the advantage of the AVS stimulus, in Experiment 2, a preliminary mouth gesture was edited from the video speech token. This manipulation defeated the advantage for both the original and the edited AVS stimulus, while the audiovisual detection enhancement persisted. Overall, the results showed enhanced auditory speech detection with visual stimuli but no advantage for a ﬁne-grained correlation between acoustic and optical speech signals. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Audiovisual speech processing; Speech detection in noise; Speech in noise; Audiovisual speech perception; Speech proce- ssing; Lipreading; Speechreading 0167-6393/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.specom.2004.10.011 q AsubsetoftheresultsinthispaperwaspresentedatAVSP03.,St.Jorioz,France,September4–7,2003.Thisresearchwassupported by the National Science Foundation (BCS 0214224). This article was written with support of the National Science Foundation. The views expressed here are those of the authors and do not necessarily represent those of the National Science Foundation. * Corresponding author. Address: Department of Communication Neuroscience, House Ear Institute, 2100 West Third Street, Los Angeles, CA 90057, USA. Tel.: +1 213 353 7044; fax: +1 213 413 0950. E-mail addresses: lbernstein@hei.org (L.E. Bernstein), auer@ku.edu (E.T. Auer Jr.), stakayanagi@hei.org (S. Takayanagi). Speech Communication 44 (2004) 5–18 www.elsevier.com/locate/specom