IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 1 A Depth Map Post-Processing Technique for 3D-TV Systems based on Compression Artifact Analysis D. V. S. X. De Silva, Graduate Student Member, IEEE, W. A. C. Fernando, Senior Member, IEEE, H. Kodikaraarachchi, Member, IEEE, S. T. Worrall, Member, IEEE,and A.M. Kondoz, Senior Member, IEEE Contributed Paper Abstract—Depth maps aid the generation of virtual viewpoints in multiview video applications. As the depth maps can be represented as a gray scale image sequence, they can be compressed using existing video codecs. However, when the depth maps are compressed using existing codecs, compression artifacts cause undesirable distortions in the rendered views. If the effects of compression artifacts can be minimized at the decoder, it would be possible to use the existing video codecs, which achieve signiﬁcant compression ratios, to effectively compress depth maps. In this paper we theoretically analyze the effects of compression artifacts on the virtual view generation process. Based on this analysis, a set of guidelines are formulated for designing a post-processing ﬁlter, which could effectively minimize the effects of compression artifacts. Assuming that depth maps are piecewise smooth images with sharp discontinuities, an adaptive bilateral ﬁltering technique is proposed as an out-of-the- loop ﬁlter at the decoder to post-process the compressed depth maps. Histograms of the compressed depth maps are analyzed on block basis to identify the dominant depth value bins in each block. Using the identiﬁed global depth value bins, the proposed technique successfully minimizes the artifacts by adjusting the histograms of the compressed depth maps. The experimental results suggest that the proposed depth map ﬁltering technique can signiﬁcantly improve the average perceptual quality of rendered views by up to 1.7dB over the state-of-the-art techniques. It is expected that the proposed post-processing technique will have important use cases in advanced 3-Dimensional Television systems. Index Terms—Bilateral Filtering, Depth Map Compression, 3D-TV I. I NTRODUCTION A Monoscopic color image and a corresponding depth map is a popular method of representing 3-Dimensional (3D) or Multiview video. The depth maps are used as an aid to generate multiple views in 3D Video (3DV) and Free Viewpoint Video (FVV) applications. The depth maps represent the per-pixel depth of a corresponding color image, and signal the disparity information needed at the virtual (novel) view rendering system. The depth maps can be represented as a gray scale image sequence for storage and transmission requirements, and thus can be compressed with existing video codecs, such as H.264/AVC. Existing video codecs are optimized to encode image sequences that are ﬁnally viewed by end users. However, depth maps on the other hand are not viewed by end-users, but are used as an aid for view rendering. Therefore, when existing video codecs are used to compress depth maps, the compression artifacts on depth maps cause distortions in rendering views. Two types of solutions could be identiﬁed in the existing literature to solve this problem. The ﬁrst solution is to develop novel compres- sion techniques suitable speciﬁcally for depth maps. This may involve new encoding techniques such as platelet based depth map coding [1] or silhouette based techniques [2] or modiﬁcation of the state-of-the- art video encoders to suit depth maps with techniques such as 3-D motion estimation [3], [4], new encoding mode selection strategies [5], [6], [7] or object based coding of depth maps [8]. A novel All authors are with the I-Lab Multimedia Communications Research Center, Center for Vision Speech and Signal Processing, University of Surrey, Guildford, United Kingdom, GU2 7XH. Corresponding Author e-mail: Varuna De Silva. lossless depth map compression technique is presented in Ref.[9]. The second type of solution considers encoding depth maps with an existing video codec as a sequence of images, and reconstruction (post-process) of the compressed depth maps at the decoder using image denoising techniques. In this type of solution, the existing video codecs are not modiﬁed to speciﬁcally suit depth maps, but image denoising techniques are employed to minimize the undesirable compression artifacts. This paper presents a solution of the second type. We consider a situation in which depth maps are encoded using a state-of-the-art video codec and propose a post-processing technique at the decoder based on bilateral ﬁltering. In existing literature, depth maps are processed for different objectives, focusing mainly on improving the depth map generation process. Joint Bilateral Filtering (JBF) was used in Ref.[10], to align depth maps with its corresponding texture image. In Ref.[11], authors use edge, motion and depth range information to improve the depth estimation in multiview video. The depth maps smoothed with a symmetric Gaussian ﬁlter in Ref.[12], to reduce the number of visual holes generated during the stereoscopic view generation. An asymmetric Gaussian ﬁlter was used in Ref.[13], to smooth the depth maps. The strength of the proposed smoothing ﬁlter is kept low in the horizontal direction, in comparison to the vertical direction. Objects are comparatively less deformed in the virtual viewpoints with this method. A cross-trilateral median ﬁlter is proposed in Ref.[14] to minimize the initial mismatches arising in disparity estimation process. The ﬁlter in Ref.[14] is able to improve the disparity estimation process, while aligning the depth map with its corresponding color image. The methods described in Refs.[10]-[14] focus on the production end of the 3D-TV chain. An initial effort to reconstruct depth maps at the receiving end using a frequent-low-high ﬁlter is proposed in Ref.[15]. For each pixel, the contents of the ﬁltering window are classiﬁed in to two sets, based on the relative occurrence of luminance values of its pixels. Two pixel sets are then represented by the mode of each set. However, the ﬁltering process proposed in Ref.[15] severely lacks the ability to preserve the object structure and thus results in poor rendering quality. A joint trilateral ﬁlter (JTF) is proposed in Ref.[16] as an alternative to the in-loop deblocking ﬁlter in the H.264 encoder- decoder architecture. The JTF uses three factors to ﬁlter the depth maps, i.e. closeness ﬁlter kernel, similarity ﬁlter kernel from depth map, and similarity ﬁlter kernel from corresponding color image. The method in Ref.[16] performs superior to the depth reconstruction ﬁlter proposed in Ref.[15], due to the inherent edge preserving capability of the JTF. The drawback of the JTF is that it assumes perfect edge alignment in color image and its corresponding depth map. Both the methods in Refs.[15] and [16] are proposed as in-loop ﬁlters. We proposed a similar ﬁlter as JTF as an out-of-the-loop ﬁlter in Ref.[17]. In Ref.[18] we proposed an adaptive bilateral ﬁltering technique to adaptively sharpen the compressed depth maps at the decoder. Thus, reducing the effects of compression artifacts on rendered views. While the Bilateral Sharpening Filter (BSF) presented in