The Next Generation of In-home Streaming: Light Fields, 5K, 10 GbE, and Foveated Compression Daniel Pohl Intel Corporation, Saarland Informatics Campus, Saarbruecken, Germany daniel.pohl@intel.com Daniel Jungmann ArKaos S.A. Chaussée de Waterloo 198 B-1640 Rhode-Saint-Genèse, Belgium el.3d.source@gmail.com Bartosz Taudul Huuuge Games, Mickiewicza 53, Szczecin, Poland wolf.pld@gmail.com Richard Membarth DFKI, Saarland Informatics Campus, Saarbruecken, Germany richard.membarth@dfki.de Harini Hariharan, Thorsten Herfet Saarland University, Saarland Informatics Campus, Saarbruecken, Germany {hariharan,herfet}@nt.uni-saarland.de Oliver Grau Intel Corporation, Saarland Informatics Campus, Saarbruecken, Germany oliver.grau@intel.com Abstract—Interacting with real-time rendered 3D content from powerful machines on smaller devices is becoming ubiquitous through commercial products that enable in-home streaming within the same local network. However, support for high resolution, low latency in-home streaming at high image quality is still a challenging problem. To enable this, we enhance an existing open source framework for in-home streaming. We add highly optimized DXT1 (DirectX Texture Compression) support for thin desktop and notebook clients. For rendered light fields, we improve the encoding algorithms for higher image quality. Within a 10 Gigabit Ethernet (10 GbE) network, we achieve streaming up to 5K resolution at 55 frames per second. Through new low-level algorithmic improvements, we increase the compression speed of ETC1 (Ericsson Texture Compression) by a factor of 5. We are the first to bring ETC2 compression to real-time speed, which increases the streamed image quality. Last, we reduce the required data rate by more than a factor of 2 through foveated compression with real-time eye tracking. Index Terms—in-home streaming, ETC1, ETC2, DXT1, light fields. I. I NTRODUCTION “I N-HOME STREAMING” refers to interacting with real- time content on a thin client that has been generated on a more powerful computing device. The user’s inputs are forwarded to the server, which processes these and sends back updated video to the client. “In-home” refers to a local network, either wired or wireless, but not over the Internet. Ideally, in-home streaming is transparent to the user, delivering the perception as if the interactively streamed application were running locally on the target device. To achieve this, latency between user inputs and screen updates needs to be lower than 100 ms [1] and the image quality needs to be high, free of noticeable artifacts. Comparing with the state of the art approaches, these requirements still leave room for significant improvements as we will show in this paper. Our contributions are extending an open source in-home streaming approach [2] with the following features: Support for multiplexed rendered light field images Higher image quality through ETC2 support Optimizations for ETC1, ETC2 and DXT1 encoding Streaming up to 5K resolution using 10 GbE Foveated compression through real-time eye tracking II. RELATED WORK The idea of controlling one compute device from another has been around for a long time. Desktop-sharing apps like Microsoft Remote Desktop and VNC (Virtual Network Computing) [3] are used, but are only optimized for 2D content. Cloud gaming approaches like "PlayStation Now" focus on lower bandwidth and use H.264 [4] compression. Specific in- home streaming solutions, supporting 3D real-time rendered content provide an opportunity to deliver a high Quality of Experience without the latency of the Internet, and with higher available data rates in the network. We compare our approach with the most commonly known in-home streaming products, which use H.264 internally: SplashTop, NVidia Shield Android TV Box and Steam. Current in-home streaming approaches are optimized towards sequences of regular 2D images, generated from rendering 3D real-time content. Auto-stereoscopic and light field displays are getting attention again, enabling a way of perceiving stereoscopic content without glasses [5]. To drive these displays, multiple views of the scene are rendered and multiplexed together into a 2D image. This can create high frequency content in the multiplexed image which does not correspond to high frequencies in the single views and hence can lead to artifacts when using classical image or video coding standards. Proceedings of the Federated Conference on Computer Science and Information Systems pp. 663–667 DOI: 10.15439/2017F16 ISSN 2300-5963 ACSIS, Vol. 11 IEEE Catalog Number: CFP1785N-ART c 2017, PTI 663