1
Abstract— This paper presents a solution for inserting
graphics in a 3D HDTV environment in real-time. It focuses the
effort on optimizing the process of insertion in order to override
depth cue conflicts between occlusion and stereoscopical
disparity caused by the insertion, thus, providing a method to
control the 3D QoE after the insertion. The authors propose an
algorithm based on disparity estimation modified to meet the
needs of this particular task. Based on Census transform
disparity estimation, the introduction of reduced disparity maps
and mode filters offers the possibility to increase the disparity
range and achieve subjective visual improvements while keeping
the amount of physical resources needed. Visual discomfort
avoidance versus objective stereoscopical analysis performance is
a key point of the study, because it enables the possibility of
overriding heavy algorithms keeping the quality of experience at
a high level. It is intended to add On Screen Displays, device or
channel logos, advertising or even subtitles to a 3D HDTV
workflow on a general hardware device, meaning that it can even
fit on private electronic devices, not only on professional
equipment.
Index Terms—3D High Definition Television (3D-HDTV),
Field Programmable Gate Array (FPGA), Census Transform,
Stereo Vision, Real-Time Video Processing.
I. INTRODUCTION
HEN dealing with bidimensional video, inserting new
graphical elements such as subtitles, message boards,
On-Screen-Displays (OSD) [1] or channel logos is just a
matter of overlapping images. When the overlapped graphic is
visually blocking an object of the original image, the graphic
appears nearer to the observer than the rest of the image.
3DTV and most of 3D cinema is based on stereo vision
systems, offering the viewer a pair of stereo images, one for
each eye, where the difference between an object’s position in
both images is interpreted by the human brain as a difference
in depth. In this case, overlapping graphics in both images left
and right is not enough, as now the depth information is not
only dependant on visual blocking, but also on disparity
between both images. Consequently, when a graphic is
inserted in the image, both, its disparity and its occlusions
should be coherent. In order to achieve this goal it is necessary
to know the depth of the different objects that make up the
Manuscript received July 14, 2011.
The authors are with the Signals, Systems and Radio communications
Department, of the E.T.S. Ingenieros de Telecomunicación, UPM, 28040,
Madrid, Spain (e-mail: jrf@gatv.ssr.upm.es; djb@gatv.ssr.upm.es;
jmm@gatv.ssr.upm.es).
scene. An object situated closer to the observer than the
graphic to be inserted should occlude the graphic, while the
graphic should occlude the object if its disparity between
stereo images indicates it is shallower than the object.
Thus, in order to achieve a proper insertion of a graphic, there
has to be information about the depth of the scene. Taking into
account the nature of the three dimensional effect, a classical
approach of stereo vision for disparity estimation suits the
need of a depth map.
Real-time processing is high computationally demanding.
That is why most of real-time stereo systems adopt simple
algorithms which are usually much faster than complex ones.
When the depth map obtained from these algorithms is used
for a concrete purpose, some further processing can be done in
order to mitigate this lower performance.
Visual discomfort caused by false occlusions is not
deterministic, so there aren’t objective methods defined to
determine the quality of such an algorithm. False occlusions
cause visual discomfort when the viewer perceives an
inconsistency between the disparity of an object and the
occlusions that it creates. A small quantity of disperse errors in
a depth map will have little effect on a global error measure,
but can create annoying discontinuities in inserted graphics.
The proposed hardware architecture is fully pipelined and
comprises the whole path of the video signal, from a dual link
reception to a HD-SDI multiplexed 3D composed video signal
transmission. It has been designed to fit a Virtex-II Pro FPGA.
The system is driven by a global clock which runs at
74,25MHz in order to achieve HD-SDI processing
(1920x1080 pixel resolution at 25 frames/s or 1280x720 at 50
frames/s), but, as the system is fully pipelined, the resolution
or frame rate can be augmented up to a global clock of
200MHz.
II. RELATED WORK
A. Augmented Reality
Inserting graphics in a 3D-HDTV video flow can be
perceived as an augmented reality situation. In augmented
reality, virtual elements are introduced in a real situation.
Graphic enhancement has to deal with depth cue conflict when
3D display is used. Studies such as [2] state the relevance of
both occlusion and disparity cues and how incoherence
between them can prevent a viewer from understanding the
depth map of a scene. Aligning virtual and real objects is
necessary to create a comprehensible situation. Quality of
perception, based on research explained in [3], is influenced
Real-Time 3-D HDTV Depth Cue Conflict
Optimization
Juan Antonio Rodrigo, David Jiménez and José Manuel Menéndez
W
2011 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin)
978-1-4577-0234-1/11/$26.00 ©2011 IEEE 5