1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2017.2656463, IEEE Transactions on Image Processing IEEE TRANSACTIONS ON IMAGE PROCESSING 1 Depth Map Super-Resolution Considering View Synthesis Quality Jianjun Lei, Member, IEEE, Lele Li, Huanjing Yue, Feng Wu, Fellow, IEEE, Nam Ling, Fellow, IEEE, and Chunping Hou Abstract—Accurate and high-quality depth maps are required in lots of 3D applications, such as multi-view rendering, 3D reconstruction and 3DTV. However, the resolution of captured depth image is much lower than that of its corresponding color image, which affects its application performance. In this paper, we propose a novel depth map super-resolution (SR) method by taking view synthesis quality into account. The proposed approach mainly includes two technical contributions. First, since the captured low-resolution (LR) depth map may be corrupted by noise and occlusion, we propose a credibility based multi- view depth maps fusion strategy, which considers the view synthesis quality and interview correlation, to refine the LR depth map. Second, we propose a view synthesis quality based trilateral depth-map up-sampling method, which considers depth smoothness, texture similarity and view synthesis quality in the up-sampling filter. Experimental results demonstrate that the proposed method outperforms state-of-the-art depth SR methods for both super-resolved depth maps and synthesized views. Furthermore, the proposed method is robust to noise and achieves promising results under noise-corruption conditions. Index Terms—Depth map, super-resolution, depth fusion, view synthesis quality, up-sampling filter. I. I NTRODUCTION N OWADAYS, depth information is being widely used in modern applications, such as 3D reconstruction [1], 3DTV [2], and pose recognition [3]. While high quality texture information is easy to be captured by popular color cameras, the depth information is hardly to be captured precisely. The main-stream depth acquisition methods can be classified into the following three categories: stereo matching methods, laser scanning methods and range sensing methods. The stereo matching methods, also known as passive meth- ods, compute depth information from two-view or multi- view images via correspondence matching and triangulation [4]. However, they involve huge computing complexity and their performance is dramatically affected by occlusion and distribution of textures [5]. The laser scanning methods can obtain accurate and high quality depth maps via slice-by-slice Manuscript received February 9, 2016; revised June 28, 2016, November 13, 2016, and December 19, 2016; accepted January 13, 2017. Date of publication January, 2017. This work was supported by the Natural Science Foundation of China (No. 61271324, 61520106002, 61471262, 91320201, 61672378). J. Lei, L. Li, H. Yue (corresponding author), and C. Hou are with the School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China (e-mail: jjlei@tju.edu.cn, lele1992@tju.edu.cn, dayueer@tju.edu.cn, hcp@tju.edu.cn). F. Wu is with the School of Information Science and Technology, Uni- versity of Science and Technology of China, Hefei, 230026, China. (e-mail: fengwu@ustc.edu.cn). N. Ling is with the Department of Computer Engineering, Santa Clara University, Santa Clara, CA 95053, USA (e-mail: nling@scu.edu). scanning of targeted scenes [6]. However, this process is time- consuming and can only be used for static scenes, which limits its application for dynamic scenes. Compared with the above two methods, the range sensing methods which use active depth sensors, such as Microsoft Kinect [7] and Time of Flight (ToF) [8] camera, to capture depth information, are attracting more and more researchers in recent years. These methods are cheaper compared with laser-scanning methods and can be used in dynamic scenes. However, due to hardware limitations, the resolution of depth maps captured by the main-stream depth sensors is much lower than that of color images and the depth acquisition process is easily affected by noise [9]. In the depth image based rendering (DIBR), the resolution of depth maps should be the same as that of color images [10]. Therefore, it’s highly desirable to develop effective and efficient depth super-resolution (SR) techniques. Depth SR is an ill-posed problem, which requires introduc- ing regularization priors to make the problem well-posed. The depth-plus-color methods, which utilize the color information to guide the depth SR process, have achieved great success in recent years. For example, the color image is used to direct the up-sampling filter, such as joint bilateral filter [11], weighted mode filter [12], and edge guided filter [9], or construct Markov random field to model the relationship between depth map and its corresponding color image [13]. However, these methods may produce artifacts when the color discontinuities are not consistent with depth discontinuities. In addition, these methods only take the depth quality into consideration, ignoring the quality of synthesized virtual views with up-sampled depth maps. Another kind of promising depth SR methods is multiple depth based methods, which focus on fusing multiple low- resolution (LR) depth maps together to super-resolve a higher resolution range map [14], [15]. In recent years, approaches that combine information from multiple color cameras and depth sensors have emerged and produced acceptable results for 3D reconstruction [16]. Recently, Choi et al. proposed improving the quality of multiple view depth maps by increas- ing its spatial resolution and enforcing interview coherence [17]. Compared with the single-view based SR methods, the method in [17] can well handle the problem of noise as well as occlusions and produce better results. However, it ignores the influence of depth map on synthesized view. Recently the 3D video coding standard has introduced view synthesize quality into the quality evaluation of depth map coding [18]. Based on the above observations, we propose a novel