240 IEICE TRANS. FUNDAMENTALS, VOL.E89–A, NO.1 JANUARY 2006 PAPER Common Acoustical Pole Estimation from Multi-Channel Musical Audio Signals Takuya YOSHIOKA † a) , Student Member, Takafumi HIKICHI †† , Masato MIYOSHI †† , Members, and Hiroshi G. OKUNO † , Nonmember SUMMARY This paper describes a method for estimating the ampli- tude characteristics of poles common to multiple room transfer functions from musical audio signals received by multiple microphones. Knowl- edge of these pole characteristics would make it easier to manipulate audio equalizers, since they correspond to the room resonance. It has been proven that an estimate of the poles can be calculated precisely when a source sig- nal is white. However, if a source signal is colored as in the case of a musical audio signal, the estimate is degraded by the frequency character- istics originally contained in the source signal. In this paper, we consider that an amplitude spectrum of a musical audio signal consists of its enve- lope and fine structure. We assume that musical pieces can be classified into several categories according to their average amplitude spectral en- velopes. Based on this assumption, the amplitude spectral envelope of the musical audio signal can be obtained from prior knowledge of the average amplitude spectral envelope of a musical piece category into which the tar- get piece is classified. On the other hand, the fine structure is identified based on its time variance. By removing both the spectral envelope and the fine structure from the amplitude spectrum estimated with the conventional method, the amplitude characteristics of the acoustical poles can be ex- tracted. Simulation results for 20 popular songs revealed that our method was capable of estimating the amplitude characteristics of the acoustical poles with a spectral distortion of 3.11 dB. In particular, most of the spec- tral peaks, corresponding to the room resonance modes, were successfully detected. key words: room resonance, common acoustical pole, musical audio signal 1. Introduction Room resonances significantly change the frequency char- acteristics of source signals. Some of their effects are benefi- cial, while others are not. For example, resonances generally provide the sounds with spatial impressions and richness; by contrast, they may reduce the sound intelligibility or cause timbre colorations. Skilled audio engineers can suppress or emphasize these effects using audio equalizers. However, if resonance information is obtained, even technically un- trained musical performers can easily enrich their own per- formances by controlling the effect of room resonances with audio equalizers. An auto regressive moving average (ARMA) model can be used to represent a room transfer function, which de- Manuscript received January 7, 2005. Manuscript revised May 26, 2005. Final manuscript received October 7, 2005. † The authors are with the Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto Univer- sity, Kyoto-shi, 606-8501 Japan. †† The authors are with the NTT Communication Science Labo- ratories, NTT Corporation, Kyoto-fu, 619-0237 Japan. a) E-mail: takuya@kuis.kyoto-u.ac.jp DOI: 10.1093/ietfec/e89–a.1.240 scribes sound transmission characteristics from a source to a microphone in a room [1]. The AR part represents the poles common to transfer functions between multiple sources and microphones in a room and describes the room resonance characteristics. The MA part varies depending on the source and microphone positions. If such common acoustical poles were estimated from signals received by microphones, even technically untrained musical performers would be able to control the effects of resonances easily. Such pole estimation is possible prior to performances with the conventional method by using a white source signal [2]. However, pole estimation during the performances seems much more convenient for the per- formers because it requires no additional task for the pole estimation. In addition, compared to the a priori pole es- timation, it might be robust to room soundfield fluctuation caused by various acoustic properties such as room tempera- ture or presence of audiences because it does not require any a priori measure of the soundfield. However, when a source signal is colored as in the case of a musical audio signal, the poles estimated with the conventional method are smeared by the characteristics of this source signal [3]. In this paper, the conventional common-pole estima- tion method mentioned above is extended so that it can esti- mate the amplitude characteristics of such poles even when a source signal is a musical audio signal. We consider that the amplitude spectrum estimated with the conventional method from a musical audio signal windowed in a short-time frame may include the time-invariant amplitude spectrum of the common acoustical poles and the amplitude spectral enve- lope and fine structure of the musical signal. We assume here musical pieces can be classified into several categories according to their average amplitude spectral envelopes; the average of the amplitude spectral envelopes over all the time frames covering the musical signal is similar to that of a mu- sical piece category to which the target piece belongs. We use the prior knowledge of such average amplitude spec- tral envelopes of musical piece categories as the average amplitude spectral envelope of the musical signal. On the other hand, since the amplitude spectral fine structure of the musical signal seems to change frame by frame, the fine structure in each time frame may be extracted by compar- ing the spectrum in that frame with those in several adjoin- ing frames. By eliminating the spectral envelope and the fine structure from the spectrum obtained with the conven- tional method, we can estimate the amplitude spectrum of Copyright c 2006 The Institute of Electronics, Information and Communication Engineers