Depth map compression for real-time view-based rendering Bing-Bing Chai * , Sriram Sethuraman, Harpreet S. Sawhney, Paul Hatrack Sarnoﬀ Corporation, 201 Washington Rd., Princeton, NJ 08543-5300, USA Abstract Realistic and interactive telepresence has been a hot research topic in recent years. Enabling telepresence using depth-based new view rendering requires the compression and transmission of video as well as dynamic depth maps from multiple cameras. The telepresence application places additional requirements on the compressed representation of depth maps, such as preservation of depth discontinuities, low complexity decoding, and amenability to real-time rendering using graphics cards. We propose an adaptation of an existing triangular mesh generation method for depth representation that can be encoded eﬃciently. The mesh geometry is encoded using a binary tree structure where single bit enabled ﬂags that mark the split of triangles and the depth values at the tree nodes are diﬀerentially coded. By matching the tree traversal to the mesh rendering order, both depth map decoding and triangle strip generation for eﬃcient rendering are achieved simultaneously. The proposed scheme also naturally lends itself to coding segmented foreground layers and providing error resilience. At similar compression ratio, new view generation using the proposed method provided similar quality as depth compression using JPEG2000. However, the new mesh based depth map representation and compression method showed a signiﬁcant improvement in rendering speed when compared to using separate compression and rendering processes. Ó 2004 Elsevier B.V. All rights reserved. Keywords: Depth map compression; View-based rendering; Triangular mesh; 3D video stream; Foreground/background separation 1. Introduction In the past few years, there has been an in- creased interest in 3D scene rendering. It has been used in sports events such as the Super Bowl, in movies and TV commercials. Traditional render- ing methods model the complete geometry and texture of a 3D scene or object. Polygonal mesh representation of the 3D geometry is the typical representation to enable fast rendering using graphics hardware. Simpliﬁcation and compres- sion of such mesh representation have received considerable attention in the graphics community over the past few years (Taubin and Rossignac, 1998; Khodakovsky et al., 2000). Such works have concentrated primarily on static models where the models are created oﬀ-line and decompression and rendering have no time con- straints. Recently, image-based rendering (IBR) tech- niques have been proposed in the computer vision/ graphics communities. Unlike the traditional ren- dering methods, IBR methods synthesize arbitrary views of a scene from a collection of images ob- served from known viewpoints. Depending on the amount of 3D information being employed, a * Corresponding author. E-mail address: bchai@sarnoﬀ.com (B.-B. Chai). 0167-8655/$ - see front matter Ó 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.01.002 Pattern Recognition Letters 25 (2004) 755–766 www.elsevier.com/locate/patrec