Efficient Encoding of a Sinusoidally-Modelled Audio Signal Using Compressed Sensing Anthony Griffin, Christos Tzagkarakis, Toni Hirvonen, Athanasios Mouchtaris and Panagiotis Tsakalides * Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH-ICS) and Department of Computer Science, University of Crete Heraklion, Crete, Greece {agriffin, tzagarak, tmhirvo2, mouchtar, tsakalid}@ics.forth.gr Abstract In this paper, the compressed sensing (CS) methodology is applied to the harmonic part of sinusoidally- modelled audio signals. As this part of the model is sparse by definition in the frequency domain, we investigate whether CS can be used for encoding this signal at low bitrates, instead of encoding the sinusoidal parameters (amplitude, frequency, phase) as current state-of-the-art methods do. CS samples signals at a much lower rate than the Nyquist rate if they are sparse in some basis, thus it is a natural choice in this context. This also has the potential benefit of moving the computational burden from the encoder to the decoder. Previous work presented an initial investigation into the performance of this scheme, and this paper demonstrates how the performance can be further improved to rival that of a state-of-the-art encoder. 1 Introduction The growing demand for audio content far outpaces the corresponding growth in users’ storage space or bandwidth. Thus there is a constant incentive to further improve the compression of audio signals. This can be accomplished either by applying compression algorithms to the actual samples of a digital audio signal, or by using a signal model and then encoding the model parameters as a second step. The sinusoidal model [1] represents an audio signal using a small number of time-varying sinusoids. The remainder error signal—often termed the residual signal—can also be modelled to further improve the resulting subjective quality of the sinusoidal model [2]. The sinusoidal model allows for a compact representation of the original signal and for efficient encoding and quantisation. State-of-the-art methods for encoding and compressing the parameters of the sinusoidal model (amplitudes, frequencies, phases) are based on directly encoding these parameters [3, 4]. However, these methods can be complex and computationally-intensive on the encoder side in order to achieve the optimum performance. In previous work [5], we proposed using the compressed sensing (CS) [6, 7] methodology to encode and compress the sinusoidally-modelled audio signals. Compressed sensing seeks to represent a signal using a number of linear, non-adaptive measurements. Usually the number of measurements is much lower than the number of samples needed if the signal is sampled at the Nyquist rate. CS requires that the signal is very sparse in some basis—in the sense that it is a linear combination of a small number of basis functions—in order to correctly reconstruct the original signal. CS generally shifts complexity from the encoder to the decoder, and thus has many simple and computationally-efficient implementations for the encoder side. Clearly, the sinusoidally-modelled part of an audio signal is a sparse signal, and it is thus natural to wonder how CS might be used to encode such a signal, and to reduce the complexity in the encoder. Our method—described in [5]—encodes the time-domain signal instead of the sinusoidal model parameters as state- of-art methods propose [3, 4]. The advantage is that the encoding operation is simplified into randomly sampling the time-domain sinusoidal signal, which is obtained after applying the sinusoidal model to a monophonic audio signal. The initial performance results—while encouraging—were inferior to those of the state-of-art encoders, and the reduced complexity in the encoder was balanced by a significant increase in complexity in the decoder. In this paper, we extend our previous work in several areas—including reducing the reconstruction complexity considerably—and show this method can now achieve performance to rival that of the state-of-the-art encoders. 2 System Model and Results Compressed sensing requires that the measurement basis is incoherent with the sparsity basis, this is often satisfied by using random Gaussian measurements. As we know that the sinusoidal model of an audio signal in sparse in the ∗ This work was funded by the Marie Curie TOK-DEV “ASPIRE” grant within the 6 th European Community Framework Program. 1