Split & Dual Screen Comparison of Classic vs Object-based Video Maarten Wijnants Hasselt University ś tUL Expertise Centre for Digital Media Diepenbeek, Belgium maarten.wijnants@uhasselt.be Sven Coppers Hasselt University ś tUL Expertise Centre for Digital Media Diepenbeek, Belgium sven.coppers@uhasselt.be Gustavo Rovelo Ruiz Hasselt University ś tUL Expertise Centre for Digital Media Diepenbeek, Belgium gustavo.roveloruiz@uhasselt.be Peter Quax Hasselt University ś tUL ś Flanders Make ś EDM Diepenbeek, Belgium peter.quax@uhasselt.be Wim Lamotte Hasselt University ś tUL Expertise Centre for Digital Media Diepenbeek, Belgium wim.lamotte@uhasselt.be ABSTRACT Over-the-top (OTT) streaming services like YouTube and Netfix induce massive amounts of video data, hereby putting substantial pressure on network infrastructure. This paper describes a demon- stration of the object-based video (OBV) methodology that allows for the quality-variant MPEG-DASH streaming of respectively the background and foreground object(s) of a video scene. The OBV methodology is inspired by research into human visual attention and foveated compression, in that it allows to adaptively and dynam- ically assign bitrate to those portions of the visual scene that have the highest utility in terms of perceptual quality. Using a content corpus of interview-like video footage, the described demonstration proves the OBV methodology’s potential to downsize video bitrate requirements while incurring at most marginal perceptual impact (i.e., in terms of subjective video quality). Thanks to its standards- compliant Web implementation, the OBV methodology is directly and broadly deployable without requiring capital expenditure. CCS CONCEPTS · Information systems → Multimedia streaming; Web applica- tions; · Networks → Public Internet; Application layer protocols; · Human-centered computing → User studies; KEYWORDS video coding, H.264, HTTP Adaptive Streaming, MPEG-DASH, sub- jective evaluation, Web ACM Reference Format: Maarten Wijnants, Sven Coppers, Gustavo Rovelo Ruiz, Peter Quax, and Wim Lamotte. 2019. Split & Dual Screen Comparison of Classic vs Object-based Video. In Proceedings of the 27th ACM International Conference on Multimedia (MM ’19), October 21–25, 2019, Nice, France. ACM, New York, NY, USA, 3 pages. https://doi . org/10. 1145/3343031. 3350582 Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). MM ’19, October 21–25, 2019, Nice, France © 2019 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-6889-6/19/10. https://doi . org/10. 1145/3343031. 3350582 1 INTRODUCTION AND MOTIVATION Massive amounts of video trafc are being delivered over the contem- porary Internet [14] and these trafc volumes are poised to increase even further in the nearby future [5]. However, when watching videos, not all spatial regions of these videos are equally impor- tant from a perceptual quality perspective. Evidence abounds in the literature that human visual focus during video consumption is sub- consciously drawn to a limited quantity of salient objects or regions that stand out from the video background [2, 7, 15]. For instance, hu- man faces are known to be natural attractors of visual attention [3, 8]. At the same time, complementary research shows that the sampling density and sensitivity of the Human Visual System (HVS) gradually decays as the distance to our eyes’ fxation area increases [3, 9]. Stated diferently, only those spatial portions of the video where viewers fxate on are processed by the HVS in maximal fdelity, while more distant regions are being assigned ever lower processing bandwidth. Driven by the just described observations, we have previously in- troduced the object-based video (OBV) methodology [18] that allows for the adaptive delivery of a visual scene by decomposing it into a background and one or more foreground objects which each can be in- dependently streamed in a quality-variant manner. As such, the OBV methodology supports the introduction of intra-scene quality dif- ferences during network delivery. In contrast, a classic frame-based video encoder does not grant such versatility, since it will enact a rather uniform bitrate and hence quality distribution over the inte- gral visual scene. The OBV methodology can be regarded as a video- only specialization of the more general object-based media (OBM) paradigm [16] that is currently under active investigation by both academia [13] and industry [10]. This demonstration illustrates the viability of leveraging video-only OBM mechanisms to achieve signif- icant cost reductions (i.e., in terms of network load) that entail no or at most limited repercussions with respect to perceived video quality. 2 OBV IMPLEMENTATION The OBV methodology is exclusively implemented using standard- ized Web technologies (i.e., HTML5, CSS, JavaScript, WebGL) to yield a portable, multi-platform solution that is accessible via an of-the-shelf Web browser [18]. The methodology takes as input a video that has been disassembled into multiple visual signals, one per background and individual foreground object. In each such sig- nal, pixels that have been segmented away are replaced by a fxed