A WATERMARKING CO-PROCESSOR FOR NEW GENERATION GRAPHICS PROCESSING UNITS Saraju P. Mohanty Nishikanta Pati Elias Kougianos Computer Science and Engineering Computer Science and Engineering Electrical Engineering Technology Univ of North Texas, TX 76203. Univ of North Texas, TX 76203. Univ of North Texas, TX 76203. Email: smohanty@cse.unt.edu Email: nishi@unt.edu Email: eliask@unt.edu Abstract— Recent growth of high speed internet and high resolution imaging has enabled electronic storage and transfer of digital multimedia contents without resorting to the loss of quality. In order to protect the illegal reproduction of the digital multimedia elements, many researchers have suggested digital watermarking as a feasible solution. Like other signal and image processing works, digital watermarking is a computationally intensive process. For efficient, high performance, real time and low cost watermarking we propose two alternatives: (1) Using the Graphics Processing Unit (GPU) available on the modern graphics cards for the complex mathematical computations or (2) Implementing a dedicated processor chip, a coprocessor for the GPU, to accomplish the task. In this paper we present the later alternative, a coprocessor for the GPU to do multimedia watermarking for real-time applications. I. I NTRODUCTION AND RELATED RESEARCH The watermarking technology makes it possible to identify the creator, distributor or authorized consumer of a multimedia element [1]. In order to be effective, a watermark should be perceptually invisible and the data owner or an independent control authority should be able to extract it easily for authenti- cation. Moreover, it must be difficult for an attacker to remove or destroy the watermark embedded inside. Many applications like video broadcasting require the copyrighting process to run in real-time at video frame rates [2]. Many significant contributions [3] [5], [6] have been made to the field of digital watermarking. Most of the robust watermarking methods often need computational intensive op- erations. This constraint puts the watermarking processing to be done off-line. However, real-time insertion of the watermark as and when data is recorded is very useful for applications like video broadcasting, traffic monitoring, etc. Now-a-days, most of the graphics cards have a powerful processor chip, referred to as Graphics Processing Unit (GPU), to accelerate the graphics processing. Recent performance improvement of the graphics cards have gained lot of interest among the reserach communities to harness the tremendous power GPU for general purpose computing [4], [7]. Fung et al. [8] have proposed to use graphics cards for efficient computation of computer vision algorithms leaving CPU free for other tasks. Inspired by the above discussed challenges and ventures, we propose a dedicated a co-processor chip targeted for real-time watermarking. The chip will be integrated as a co-processor to the existing GPU. We have customized a renowned robust invisible image watermarking algorithm to facilitate building of the real-time architecture of the watermarking processor fol- lowed by hardware implementation. The proposed architecture is designed aiming at an easy integration as an module into any existing GPUs. II. THE PROPOSED NEW GENERATION GPU Fig. 1 depicts the proposed schematic architecture for new generation GPU with watermarking processor. The graphics pipeline as shown in Fig. 1 are fundamental blocks of all the modern GPUs as described by Owens et al. [4]. Fragment Buffer Vertex Vertex Processor Rasterization Processor Frame Buffer Texture Buffer Watermarking Processor Input Image Input(EN/DE) Data Input Data Watermarked Processor Fig. 1. The Proposed GPU Architecture with Watermarking Co-Processor We propose to add a watermarking processor to the existing hardware organization to enable the GPU to watermark multi- media elements, like images, efficiently. For normal graphics applications, the pipeline takes a list of geometry, expressed in terms of vertices, as input. The functional blocks perform the necessary processing and render the final image into the frame buffer. However, for real-time watermarking the watermarking processor will be activated. The watermarking processor will take the data from global memory inside texture unit. Based on the input whether to embed or extract (EN / DE) a watermark, the watermarking processor executes the required instructions. The output from the watermarking processor gets saved in the frame buffer. While the output of the watermarking processor is a watermarked image if a watermark is embedded into the host image, a binary signal describing the presence or absence of a watermark is sent out during extraction. III. THE WATERMARKING ALGORITHM We present an invisible robust watermarking algorithm amicable for hardware implementation based on the method- ology proposed by Piva et al. [5] that does not need original image for extraction and authentication (blind). The algorithm enables us to design an efficient VLSI Architecture without losing the robustness of the blind invisible watermarking. The steps of invisible insertion is described below: 1) Find the 8 × 8 block-wise DCT of the host image.