G-CAST: Gradient Based Image SoftCast for Perception-Friendly Wireless Visual Communication Ruiqin Xiong 1 , Hangfan Liu 1 , Siwei Ma 1 , Xiaopeng Fan 2 , Feng Wu 3 and Wen Gao 1 1 Institute of Digital Media, Peking University, Beijing 100871, China 2 Department of Computer Science, Harbin Institute of Technology, Harbin 150001, China 3 Microsoft Research Asia, Beijing 100080, China Email: rqxiong@pku.edu.cn Abstract Conventional image and video communication systems are usually designed with the ob- jective being to maximize the ﬁdelity of reconstructed images measured by mean square errors (MSE). It is well known that the ﬁdelity metric MSE may not reﬂect the visual quality perceived by human eyes. Recent advancements in image quality assessment tell us that the structural similarity (SSIM), especially the gradient similarity, reveals the per- ceptual ﬁdelity of images more reliably. Inspired by this observation, this paper proposes a new image communication approach, which conveys the visual information in an image by transmitting the image gradients and recovers the image from the received gradient data at decoder side using statistical image prior knowledge. In particular, we designed a gradient- based image SoftCast scheme for wireless scenarios. Experimental results show that the proposed scheme can produce reconstruction images with much better perceptual quality. The advantage in perceptual quality is veriﬁed by the quality improvement measured by the metrics SSIM and gradient signal-to-noise ratio (GSNR). 1 Introduction Today, mean square error (MSE) is still widely used as the ﬁdelity measure for the design and optimization of image communication systems. For example, to predict a pixel block from the neighboring pixels in already-reconstructed blocks, the predic- tion mode is usually selected in such a way that minimal square prediction error is achieved. Similarly, the rate-distortion optimization (RDO) is generally performed using MSE as the distortion measure. In addition, we may interpret the adoption of orthogonal (or nearly orthogonal) decorrelation transform (e.g. DCT or DWT) in the existing image and video coding schemes as an example of using the MSE metric implicitly. This is because an orthogonal transform keeps the MSE distortion unchanged so that a good approximation in the transform domain is guaranteed to be a good approximation in the signal domain, in terms of MSE. MSE may be a very good distortion metric indeed for some signal processing tasks. However, it exhibits weak performance in some other applications and has been widely criticized for serious shortcomings, especially when dealing with perceptually impor- tant signals [1,2]. Most pictures, still or moving, are meant to be viewed by people This work was supported in part by the National Natural Science Foundation of China (61073083, 61370114, 61121002), Beijing Natural Science Foundation (4112026, 4132039) and Re- search Fund for the Doctoral Program of Higher Education (20100001120027, 20120001110090). 2014 Data Compression Conference 1068-0314/14 $31.00 © 2014 IEEE DOI 10.1109/DCC.2014.55 133