1 Abstract— This paper presents a solution for inserting graphics in a 3D HDTV environment in real-time. It focuses the effort on optimizing the process of insertion in order to override depth cue conflicts between occlusion and stereoscopical disparity caused by the insertion, thus, providing a method to control the 3D QoE after the insertion. The authors propose an algorithm based on disparity estimation modified to meet the needs of this particular task. Based on Census transform disparity estimation, the introduction of reduced disparity maps and mode filters offers the possibility to increase the disparity range and achieve subjective visual improvements while keeping the amount of physical resources needed. Visual discomfort avoidance versus objective stereoscopical analysis performance is a key point of the study, because it enables the possibility of overriding heavy algorithms keeping the quality of experience at a high level. It is intended to add On Screen Displays, device or channel logos, advertising or even subtitles to a 3D HDTV workflow on a general hardware device, meaning that it can even fit on private electronic devices, not only on professional equipment. Index Terms—3D High Definition Television (3D-HDTV), Field Programmable Gate Array (FPGA), Census Transform, Stereo Vision, Real-Time Video Processing. I. INTRODUCTION HEN dealing with bidimensional video, inserting new graphical elements such as subtitles, message boards, On-Screen-Displays (OSD) [1] or channel logos is just a matter of overlapping images. When the overlapped graphic is visually blocking an object of the original image, the graphic appears nearer to the observer than the rest of the image. 3DTV and most of 3D cinema is based on stereo vision systems, offering the viewer a pair of stereo images, one for each eye, where the difference between an object’s position in both images is interpreted by the human brain as a difference in depth. In this case, overlapping graphics in both images left and right is not enough, as now the depth information is not only dependant on visual blocking, but also on disparity between both images. Consequently, when a graphic is inserted in the image, both, its disparity and its occlusions should be coherent. In order to achieve this goal it is necessary to know the depth of the different objects that make up the Manuscript received July 14, 2011. The authors are with the Signals, Systems and Radio communications Department, of the E.T.S. Ingenieros de Telecomunicación, UPM, 28040, Madrid, Spain (e-mail: jrf@gatv.ssr.upm.es; djb@gatv.ssr.upm.es; jmm@gatv.ssr.upm.es). scene. An object situated closer to the observer than the graphic to be inserted should occlude the graphic, while the graphic should occlude the object if its disparity between stereo images indicates it is shallower than the object. Thus, in order to achieve a proper insertion of a graphic, there has to be information about the depth of the scene. Taking into account the nature of the three dimensional effect, a classical approach of stereo vision for disparity estimation suits the need of a depth map. Real-time processing is high computationally demanding. That is why most of real-time stereo systems adopt simple algorithms which are usually much faster than complex ones. When the depth map obtained from these algorithms is used for a concrete purpose, some further processing can be done in order to mitigate this lower performance. Visual discomfort caused by false occlusions is not deterministic, so there aren’t objective methods defined to determine the quality of such an algorithm. False occlusions cause visual discomfort when the viewer perceives an inconsistency between the disparity of an object and the occlusions that it creates. A small quantity of disperse errors in a depth map will have little effect on a global error measure, but can create annoying discontinuities in inserted graphics. The proposed hardware architecture is fully pipelined and comprises the whole path of the video signal, from a dual link reception to a HD-SDI multiplexed 3D composed video signal transmission. It has been designed to fit a Virtex-II Pro FPGA. The system is driven by a global clock which runs at 74,25MHz in order to achieve HD-SDI processing (1920x1080 pixel resolution at 25 frames/s or 1280x720 at 50 frames/s), but, as the system is fully pipelined, the resolution or frame rate can be augmented up to a global clock of 200MHz. II. RELATED WORK A. Augmented Reality Inserting graphics in a 3D-HDTV video flow can be perceived as an augmented reality situation. In augmented reality, virtual elements are introduced in a real situation. Graphic enhancement has to deal with depth cue conflict when 3D display is used. Studies such as [2] state the relevance of both occlusion and disparity cues and how incoherence between them can prevent a viewer from understanding the depth map of a scene. Aligning virtual and real objects is necessary to create a comprehensible situation. Quality of perception, based on research explained in [3], is influenced Real-Time 3-D HDTV Depth Cue Conflict Optimization Juan Antonio Rodrigo, David Jiménez and José Manuel Menéndez W 2011 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin) 978-1-4577-0234-1/11/$26.00 ©2011 IEEE 5