Prediction-Based Coding of Speech Signals Using Multiscale Recurrent Patterns Frederico S. Pinagé †‡ , Murilo B. de Carvalho § , Eduardo A. B. da Silva , Sergio L. Netto frederico.pinage@fucapi.br, murilo@telecom.uff.br, eduardo@lps.ufrj.br, sergioln@lps.ufrj.br PEE-COPPE/DEL-Poli, Federal University of Rio de Janeiro, POBox 68.504, Rio de Janeiro, RJ, 21945-970, Brazil. § TET/CTC, Federal University Fluminense, R. Passos da Pátria 156, Niteroi, RJ, 24210-240, Brazil. Fundação Centro de Análise, Pesquisa e Inovação Tecnológica, Distrito Industrial, Manaus, AM, 69075-351, Brazil. Abstract—This paper investigates the performance of the multidimensional multiscale parser (MMP) algorithm for speech coding. A new prediction-based scheme is considered, where the MMP algorithm operates on the associated prediction-error signal instead of the original speech signal. Other features are considered, such as: nonuniform quantization of MMP initial dictionary, use of auxiliary dictionary of recent past samples, and quanti- zation/normalization during the dictionary updating stage. It is verified that the resulting MMP scheme, combining all these techniques, at 8 kbps can achieve perceptual objective scores comparable to the ITU-T G.729 codec. I. I NTRODUCTION Current speech coders achieving the best compromise of voice quality and compression rate are based on the code-excited linear prediction (CELP) approach [1]. These coders employ the analysis-by-syntheses (AbS) procedure to determine the input signal to a linear predic- tion model of the human vocal tract. CELP-based speech coders, such as ITU-T G.729 recommendation [2], yield top-notch voice quality at coding rates around 4–10 kbps, whereas standard waveform coders, such as the ITU-T G.711 [3] or G.726 [4] recommendations, operate at 64 and 32 kbps, respectively. The so-called multidimensional multiscale parser (MMP) [5] [6] uses past portions of the signal to perform the encoding process. These past segments, after proper encoding, are scaled to distinct lengths and incorporated into a dictionary, thus providing a learning ability to the overall MMP scheme. The MMP has been successfully applied to the coding of, for instance, electrocardio- gram signals [7], stereoscopic images [8], and three- dimensional images [9]. Since the MMP algorithm operates exclusively in the time or space domains, it can be seen as a waveform codec [10]. Initial application of the MMP algorithm in speech coding, as presented in [11], have motivated further investigation of its coding performance in this new context, by incorporating additional features to its learn- ing process. In particular, in this paper we assess the MMP performance when operating on the residue signal yielded by the linear prediction of the speech signal under analysis, whereas reference [11] considers the MMP direct coding of the speech signal. It is verified that the residue signal presents a higher regularity than the original speech signal, which better suits the MMP learning process. This increases, for a given coding rate, the perceptual quality of the MMP reconstructed signal, as quantified by the ITU-T P.862 PESQ (perceptual evaluation of speech quality) [12] recommendation. In order to evaluate the performance of the MMP algorithm in coding the prediction error of a given speech signal, this paper is organized as follows: Section II presents the concepts associated to the linear prediction concept of speech signals; Section III introduces the prediction-based MMP algorithm with additional fea- tures considered in this work, namely: non-uniform ini- tial dictionary, auxiliary displacement dictionary, updat- ing procedure using quantized and/or normalized signal segments; Section IV presents the experimental results for these different MMP versions. The results for the 8-kbps coding rate allow a direct comparison to the G.729 performance. It is verified that at this coding rate the prediction-based MMP algorithm achieves a PESQ score, after a proper mapping onto the mean-opinion score (MOS) scale, of 3.69, which is quite close to the G.729 score of 3.84 for the same database. II. LINEAR PREDICTION Linear prediction (LP) is a modeling approach which estimates the current sample value of a signal s(n) using a linear combination of N of its past samples, that is ˆ s(n)= N i=1 a i s(n - i), (1)