F.Live Towards Interactive Live Broadcast FTV Experience Shannon Chen, Zhenhuan Gao, and Klara Nahrstedt University of Illinois at Urbana-Champaign {cchen116, zgao11, klara}@illinois.edu Abstract— Free-viewpoint television (FTV) is a visionary application that provides immersive experience to the audience with the freedom of changing viewpoint during the video playout. However, live broadcasting and user interaction do not coexist in existing FTV systems. In this paper, we propose F.Live, a framework of FTV content dissemination that supports user- initiated viewpoint changing for live broadcasting. Simulation result of a large-scale experiment, based on camera array settings of existing Nagoya systems and EyeVision System, shows that F.Live is capable of supporting 100,000 concurrent audiences with free-viewpoint low user interaction latency and feasible bandwidth requirements. Keywords—broadcasting, live streaming, FTV I. INTRODUCTION Free-viewpoint television (FTV) is a visionary application that provides immersive experience of a broadcasted physical event to the audience with the freedom of changing viewpoint during the video playout. Through depth estimation and interpolation among multiple raw video streams captured from different angles, FTV system can render a 3D space where the audience can view the event from arbitrary viewpoints as if they were physically in the same studio with the filmed objects. Large-scale camera array systems installed in outdoor stadiums and theaters are also possible for capturing sport events [1] and concerts [2] from a 360 degree view. Yet two desirable characteristics: interactivity and live broadcasting fail to coexist in current FTV systems. Generally, the content delivery chains of existing FTV prototypes can be classified into two types. Type-1 is designed for broadcasting live performances (Fig. 1a). Under this scenario, raw streams captured by a number of cameras are interpolated to create virtual camera streams. These virtual and real streams altogether grant smooth transition between arbitrary viewpoints, hence create a free-viewpoint immersive experience. A commercialized system which implements this delivery chain has been deployed for live broadcasting of Super Bowl [7]. However, the term “free” in this type of delivery can be misleading because end users are deprived of interactively choosing their own viewpoint. The freedom of dynamic viewpoint selection is still in the hand of the service provider. A director of the program selects the viewpoint for the audience, so eventually only one view will be broadcasted at any given time. Type-2 FTV delivery chain is suitable for pre-recorded programs (Fig. 1b). Under this scenario, raw video streams captured by different cameras are aggregated to create one huge free-viewpoint stream. The free-viewpoint stream is delivered to the audience as a whole. During the playout, the audience switches viewpoint arbitrarily by giving commands on-the-fly to their local display systems. The decoding module of the system only decodes necessary data for rendering scene related to user’s viewpoint decision. Adopting this delivery chain, a working prototype comprised of 100 cameras has been developed by Tanimoto lab at Nagoya University [3]. However, delivery of the huge free-viewpoint stream makes bandwidth and computation requirement inevitably high. Thus, the delivery chain is not suitable for live broadcasting or any kind of low-buffer real-time streaming through modern data network. For the Nagoya FTV system to stream with HDTV resolution (1080p), the estimated required bandwidth is 1.6 Gbps [4][5], which largely surpasses regular networking capacity. With multi-view video coding (MVC) [6], the bitrate can be lowered by 20~30% but still in the gigabit magnitude. In this work, we propose a new framework and delivery chain for live and interactive broadcast FTV, called F.Live. As we saw in the previous two types of FTV delivery chains, at any given time, a user can watch the free-viewpoint video from only one chosen view. The thick lines in Fig. 1 indicate the aggregated free-viewpoint stream which contains massive information. We see that the aggregated stream never reaches the end user, for she only needs content of one viewpoint at any given time. Thus, the earlier the decision of viewpoint is made, the more bandwidth and computation resource can be saved in the overall delivery. Based on this observation, we propose a new view-based delivery chain. The new chain introduces a session manager entity to coordinate the interactive user requests and distribution of raw video streams to different end users right after they are captured (Fig. 2). No aggregation is made throughout the delivery chain. The rendering module is moved to the audience side so that the users only have to receive minimum necessary streams to construct the scene they wish to see. Comparing to the bandwidth saving of MVC (20~30%), the view-based delivery chain can save at least 50%. Since the aggregation module is obviated, dissemination of raw streams becomes more elastic. Each camera in the producer site can be seen as independent content producer entity in the system model. In addition, session manager also becomes an independent entity which sits on top of the dissemination network to control the content flows. Thus, content producers, audience sites, and the session manager form a distributed P2P delivery overlay which helps alleviate the transmission burden of producer site by content sharing. Due to its unique service