New matching pursuit based sinusoidal modelling method for audio coding P. Vera-Candeas, N. Ruiz-Reyes, M. Rosa-Zurera, F. Lo ´ pez-Ferreras and J. Curpia ´ n-Alonso Abstract: A new method is proposed to improve sinusoidal modelling based on energy-adaptive matching pursuits. To reduce the complexity of the algorithm, an over-complete dictionary composed of complex exponentials is used and an efficient implementation is presented. An analysis –synthesis windows scheme that avoids overlapping is also proposed. For efficient quantisation of sinusoidal model parameters, a new algorithm that significantly reduces the side information required by the decoder is described. Experimental results show the excellent performance of the proposed method for sinusoidal modelling compared to some others that are integrated into multiparts models for low bit rate audio coding applications. 1 Introduction The classical sinusoidal or harmonic model [1] comprises an analysis – synthesis framework that represents a signal x[n ] as the sum of a set of K sinusoids with time-varying frequencies, phases and amplitudes x½n ^ x½n¼ X K k¼1 A k ½n · cos v k ½n ·n þ f k ½n ð Þ ð1Þ where A k ½n; v k ½n and f k ½n represent the amplitude, frequency and phase of the kth sinusoid, respectively. A large number of methods have been proposed for estimating the parameters of the sinusoidal model [2–5]. Estimation of parameters is typically accomplished by peak picking the short-time Fourier transform (STFT). Usually, analysis by synthesis is used in order to verify the detection of each spectral peak. The length of the analysis frame should be signal-dependent so as to achieve an adapted multiresolution analysis [6]. The harmonic synthesis model expressed in (1) involves a peak-tracking process, which is usually carried out by means of linear interpolation of the amplitudes, while cubic interpolation is used for phases [1, 4]. This type of interpolation supposes an important limitation due to the need to overlap adjacent frames so as to track changes in the input signal. In this paper we propose a new method for estimating the parameters of the sinusoidal model, which is based on the matching pursuit algorithm. This method improves the sinusoidal modelling and avoids sinusoidal parameters interpolation. Further improvements are achieved if win- dows that do not require overlapping are considered. Outstanding results are obtained with the proposed method when rectangular and trapezoidal windows are used in the analysis and synthesis stages, respectively. Finally, we also propose a low-complexity algorithm based on psychoacoustic principles to accomplish an efficient quantisation of the amplitudes corresponding to audible tones, when sinusoidal modelling is integrated into multipart audio coders. Its main feature is that the side information required by the decoder is significantly reduced, because masking thresholds due to quantised amplitudes are estimated at the encoder and the decoder. 2 Sinusoidal modelling by matching pursuit with a dictionary of complex exponentials 2.1 Matching pursuit The matching pursuit algorithm was introduced by Mallat and Zhang in [7]. It is an iterative algorithm that offers suboptimal solutions for decomposing a signal x[n ] in terms of unit-norm expansion functions g i ½n chosen from an over-complete dictionary D, where l 2 norm is used as the approximation metric because of its mathematical conven- ience. In each step of the iterative procedure, the vector g i ½n which gives the largest inner product with analysed signal x½n is chosen. The contribution of this vector is then subtracted from the signal and the process is repeated on the residual. At the mth iteration the residue is r mþ1 ½n¼ x½n m ¼ 0 r m ½n a iðmÞ · g iðmÞ ½n m 6¼ 0 & ð2Þ where a iðmÞ is the weight associated to the optimum function (or atom) g iðmÞ at the mth iteration. To get this value, the weight a m i associated to each function g i ½n2 D at the mth iteration is computed by applying the orthogonality principle between each function g i ½n and the residual at the ðm þ 1Þth iteration, if each considered function g i ½n2 D is selected as the optimum one a m i ¼ hg i ½n; r m ½ni hg i ½n; g i ½ni ¼ hg i ½n; r m ½ni kg i ½nk 2 ð3Þ q IEE, 2004 IEE Proceedings online no. 20040044 doi: 10.1049/ip-vis:20040044 P. Vera-Candeas, N. Ruiz-Reyes and J. Curpia ´n-Alonso are with the Electronic Department, University of Jae ´n, Polytechnical School, Linares (Jae ´n), Spain M. Rosa-Zurera and F. Lo ´pez-Ferreras are with Signal Theory and Communication, University of Alcala ´, Polytechnical School, Alcala ´ de Henares (Madrid), Spain Paper first received 13th September 2002 and in final revised form 9th June 2003 IEE Proc.-Vis. Image Signal Process., Vol. 151, No. 1, February 2004 21