1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 REAL-TIME GENERATION OF MULTI–VIEW VIDEO PLUS DEPTH CONTENT USING MIXED NARROW AND WIDE BASELINE Frederik Zilly 1,2 , Christian Riechert 1 , Marcus Müller 1 , Peter Eisert 1 , Thomas Sikora 2 , Peter Kauff 1 Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany Technische Universität Berlin, Germany frederik.zilly@hhi.fraunhofer.de ABSTRACT Content production for stereoscopic 3D-TV displays has become mature in the past years. The content is usually shot using two cameras as the glasses-based target devices require two views as input. Beside stereoscopic 3D-TVs, huge pro- gress has also been achieved in the improvement of the image quality of glasses-free auto-stereoscopic displays and light- field displays. Concerning the latter two display families, the content production workflow is less elaborated and more complex, as the number of required views not only differs considerably but is also likely to increase in the near future. As a co-existence of all 3D display families can be expected for the next years, one aims to establish an efficient content production workflow which yields high quality content for all 3D-TV displays. Consequently, the quality of the cameras should be comparable to state-of-the art HD-TV cameras, and the whole content acquisition process from calibration to post-production should be efficient and be conductible within a standard studio environment or film set. Hence, we seek to avoid the use of large camera arrays involving machine vision cameras which have been used traditionally in laborato- ry environments in the past years to produce multi-view camera content. Against this background, we present a real-time content production workflow based on a four camera rig involving a cen- tral narrow baseline, with two cameras mounted on a standard beam-splitter rig known from stereoscopic 3D productions, and a wide baseline comprising of two satellite cameras mounted outside the mirror box. As all four cameras are posi- tioned on a common baseline they form a linear camera array. In this paper, we describe in detail the multi-view video plus depth generation workflow optimized for the specialized four-camera setup. Experimental results show that the pro- posed 3D production workflow can be conducted within a standard studio environment. The generated multi-view video plus depth data is also suitable for high quality view generation along the whole baseline through depth image based ren- dering for auto-stereoscopic and light-field displays. Moreover, a native stereo pair is generated with an inherent depth volume suitable for direct reproduction on a glasses-based 3D display without any processing. Index Terms — 3D-TV, depth image based rendering, DIBR, depth estimation, multi-view video plus depth, MVD4, real-time *Manuscript Click here to view linked References