Performance Evaluation of Digital Audio Watermarking Algorithms J. D. Gordy and L. T. Bruton Department of Electrical and Computer Engineering University of Calgary 2500 University Drive N.W. Calgary, Alberta, Canada T2N 1N4 Abstract- We propose an algorithm-independent framework for rigorously comparing digital watermarking algorithms with respect to bit rate, perceptual quality, computational complexity, and robustness to signal processing. The framework is used to evaluate five audio watermarking algorithms from the literature, revealing that frequency domain techniques perform well under the criteria. I. INTRODUCTION In the past few years, a need has arisen for protecting copyright ownership of electronic media. Powerful and low- cost computers allow people to easily create and copy multimedia content, and the Internet has made it possible to distribute this information at very low cost. However, these enabling technologies also make it easy to illegally copy, modify, and redistribute multimedia data without regard for copyright ownership. A recent example of this problem is the controversy regarding piracy of high-quality music across the Internet in MPEG Layer III (MP3) format [1]. Digital watermarking is seen as a partial solution to the problem of protecting digital media, for it allows content creators to embed sideband data into a host signal, such as author or copyright information. Many techniques have been proposed for watermarking audio, image, and video, and comprehensive surveys of these technologies may be found in [2] and [3]. However, the literature lacks an effective means of comparing the different approaches. An evaluation framework was recently described, but is limited to digital image watermarking [4]. The goal of this paper is to present an algorithm- independent set of criteria for quantitatively comparing the performance of digital watermarking algorithms. This framework is then used to evaluate a selection of five audio watermarking algorithms from the literature. The paper is organized as follows. In Section II we present our evaluation criteria, and in Section III we provide experimental data and an analysis of the evaluated algorithms. Finally, in Section IV we summarize the results of this investigation. II. PERFORMANCE EVALUATION FRAMEWORK In this section we provide a description of the performance evaluation framework. A. Conventions In order to provide a common basis to describe and compare the algorithms, the following conventions are employed. Let ) (n x represent a host signal of length N samples, divided into M N B = blocks of M samples each. One bit is embedded into each block. A block division was chosen because it conveniently allows for a variable number of bits to be embedded by adjusting the block size. ) ( ~ n x represents the watermarked signal, and { } 1 , 1 ) ( + - m w is a bipolar binary sequence of bits to be embedded within the host signal, for 1 0 - B m . Finally, ) ( ~ m w represents the set of watermark bits extracted from the watermarked host signal. B. Evaluation Criteria Four criteria were carefully selected as part of the evaluation framework. They were chosen to reflect the fact that watermarking is effectively a communications system. In addition, the criteria are simple to test, and may be applied to any type of watermarking system (audio, image, or video). It is important to note that the requirements of a practical watermarking system vary between applications, and so one criterion may be more important in some situations than in others. For example, a low computational cost may be vital to ensure that an algorithm can be implemented in real time on a given DSP system. The criteria are described in the following subsections. 1) Bit Rate Bit rate refers to the amount of watermark data that may be reliably embedded within a host signal per unit of time or space, such as bits per second or bits per pixel. A higher bit rate may be desirable in some applications in order to embed more copyright information. In this study, reliability was measured as the bit error rate (BER) of extracted watermark data. For embedded and extracted watermark sequences of length B bits, the BER (in percent) is given by the expression: - = = = 1 0 ) ( ) ( ~ , 0 ) ( ) ( ~ , 1 100 B n n w n w n w n w B BER 2) Perceptual Quality Perceptual quality refers to the imperceptibility of embedded watermark data within the host signal. In most applications, it is important that the watermark is undetectable to a listener or viewer. This ensures that the quality of the host signal is not perceivably distorted, and