Humans are still the best lossy image compressors Ashutosh Bhown 1,* , Soham Mukherjee 2,* , Sean Yang 3,* , Shubham Chandak 4 , Irena Fischer-Hwang 4 , Kedar Tatwawadi 4 , Tsachy Weissman 4 1 Palo Alto High School 2 Monta Vista High School 3 Saint Francis High School 4 Stanford University schandak@stanford.edu Abstract Lossy image compression has been studied extensively in the context of typical loss functions such as RMSE, MS-SSIM, etc. However, it is not well understood what loss function might be most appropriate for human perception. Furthermore, the availability of massive public image datasets appears to have hardly been exploited in image compression. In this work, we perform compression experiments in which one human describes images to another, using publicly available images and text instructions. These image reconstructions are rated by human scorers on the Amazon Mechanical Turk platform and compared to reconstructions obtained by existing image compressors. In our experiments, the humans outperform the state of the art compressor WebP in the MTurk survey on most images, which shows that there is signiﬁcant room for improvement in image compression for human perception. Data: The images, results and additional data is available at https://compression. stanford.edu/human-compression. Introduction Since the advent of electronic media, image compression has been studied exten- sively, leading to multiple image formats and compression techniques such as PNG [1], JPEG [2], JPEG2000 [3], JPEG XR [4], BPG [5] and WebP [6]. In order to achieve signiﬁcant reduction in image size, most compression techniques allow some loss while compressing images. However, the loss functions used do not correspond to human perception, and the resulting images may be blurry and unnatural at high loss levels. The left two panels of Figure 1 show an example in which compression and reconstruction using WebP [6] results in a severely blurred image. It seems natural to posit that better compression results can be achieved using a loss function optimized for human perception. We refer to such a loss function as “human-centric.” The rightmost panel of Figure 1 shows an example of a possible human-centric reconstruction which prioritizes image content over pixel-by-pixel ﬁ- delity of grass texture. Indeed, there has been a large body of work in the computer vision community [7][8][9] in order to better understand human perception, and hence a loss function governing human vision. Some compression methods, for example, take advantage of the fact that human vision is more susceptible to diﬀerences in intensity than in color, and quantize color space more crudely than intensity space in order to achieve better compression performance. * These authors contributed equally to this work. Most of this work was performed as part of their summer internship at Stanford Electrical Engineering department. arXiv:1810.11137v2 [eess.IV] 29 Oct 2018