Dynamic sound stream formation based on continuity of spectral change 1 Ikuyo Masuda-Katsuse * , Hideki Kawahara 2-2 Hikaridai Seika-cho Soraku-gun, Kyoto 619-0288, Japan Received 25 January 1998; received in revised form 18 September 1998 Abstract A proposed computational model that dynamically tracks and predicts changes in spectral shapes was veri®ed in both psychophysical experiments and engineering applications. The results of the psychophysical experiments con- ®rmed the model's validity and suggested that `the rule of good continuity' also held in audition. Furthermore, a stream segregation system was implemented with the proposed model. It was composed of simultaneous grouping and se- quential integration processes. An eective integration of two processes was performed by dynamically controlling the sequential integration based on the reliability of the output of the simultaneous grouping. Finally, we applied this system to phonemic restoration and segregation of two simultaneous utterances, showing the proposed model to be eective for such engineering applications. Ó 1999 Elsevier Science B.V. All rights reserved. Keywords: Auditory scene analysis; Computational model; Dynamic stream formation; Speech segregation; Phonemic restoration; Prediction 1. Introduction An important task in audition is to decompose a mixture that arrives at the ears into elemental components and to integrate these components into individual auditory objects. Bregman de®ned auditory scene analysis as a process in which au- ditory evidence coming from one sound source is integrated into a perceptual unit (Bregman, 1993) and listed many auditory grouping cues and heu- ristic rules used in integrating such evidence into perceptual units in his book Auditory Scene Analysis (Bregman, 1990). As cues for sound source segregation, he listed apparent spatial ori- gin, timbre, fundamental frequency, temporal proximity, harmonicity, and so on. He proposed two processes in segregating sound sources from a mixture; one is based on schemata and the other is not. He called the latter process primitive auditory scene analysis, which is used prior to the schema- based process. Since he regarded primitive audi- tory scene analysis as important, he systematically investigated the kinds of general acoustical regu- larities used as cues for sound segregation. Primitive auditory scene analysis is composed of two processes (Bregman, 1990). One is an In- tegration of Simultaneous Components, which in- tegrates spectral components coming from the same sound source. In the case of speech sounds, Speech Communication 27 (1999) 235±259 * Corresponding author. Address: Institute of Systems and Information Technologies, Kyushu 2-1-22-7F Momochihama, Sawara-ku, Fukuoka 814-0001, Japan. Tel.: +81 92 852 3460; fax: +81 92 852 3465; e-mail: ikuyo@k-isit.or.jp 1 Speech ®les available. See http://www.elsevier.nl/locate/ specom 0167-6393/99/$ ± see front matter Ó 1999 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 6 3 9 3 ( 9 8 ) 0 0 0 8 4 - 3