OBJECT-BASED CODING FOR KINECT DEPTH AND COLOR VIDEOS Cuiling Lan*, Jizheng Xu # , and Feng Wu # *Xidian University, Xi’an, China, # Microsoft Research Asia ABSTRACT Simultaneously capturing of color and depth videos, e.g. with Kinect, favors many applications and has become very popular. Efficient representation and compression of such data is important yet challenging. In this paper, we have designed an object-based coding system to compress Kinect-like depth and color videos. Segmentation is first conducted to obtain different object planes, where a mask image is utilized to identify them. We compress depth and color images respectively using the proposed object-based coding codec, which is designed based on High Efficiency Video Coding (HEVC). The mask image is losslessly compressed by adding a new context-based mode to HEVC. To assure the alignment of object boundaries on the depth image and those on the color image, a pre-processing is conducted over the depth image. The separate coding of the different object planes for the depth image can avoid the inefficiency coding of edges blocks at object boundaries and thus bring obvious coding gain. Moreover, the attractive functionality of “content-based” coding which permits the transmission of the interested object planes rather than an entire image provides a practical way to decrease the bitrate. Index Terms—Kinect, depth and color, object-based coding 1. INTRODUCTION Recently, many consumer products for capturing both color and depth images are available, such as Microsoft Kinect that uses an infrared structured light system to sense depth [1]. Kinect has a projector emitting infrared patterns and a camera receiving patterns. The depth is derived based on the offsets of the local patterns. In addition, there is a color camera to capture the color video simultaneously. The easy access of both the color and depth images can benefit many applications in computer vision and graphics fields, such as action detection in human-computer interaction, object recognition, tracking, foreground and background segmentation, 3D reconstruction and so on [2]. Inevitably, high efficient compression strategies are much desired for the storing or transmitting of such huge amount of depth and color data. Much excellent work has been done for images/video compression. The state-of-the-art compression standards of H.264/AVC and the being developed High Efficiency Video coding (HEVC) [3] provide high coding performance. The data redundancy is reduced through the exploration of the spatial and temporal correlations by means of prediction and transform. They belong to the category of image based compression where an entire image is compressed whatever the contents are. However, in many applications such as interactive media, video conference, surveillance, it is unnecessary to transmit the entire image whenever there are only some regions of interest. Standard of MPEG-4 video- object coding [4][5] introduced an attractive feature that supports the access of “objects” within the video scenes. This provides a good idea to reduce bitrate while preserving interested image contents by only decoding the interested object planes. Liu et al. have extended the object-based coding to H.264/AVC to have this feature [6]. For multi- view video coding, Ng. et al. have constructed an object- based coding system as in [7]. Most of those compression standards are designed for color images. However, they are not so efficient for the depth image compression without considering its characteristics. As we have known, the depth image usually has steep changes over object boundaries and smooth contents within objects. Conventional 2D transform is not efficient for such high frequency contents when prediction fails. Much research work on depth compression has been devoted to the improvement of the coding efficiency over edge regions. Approaches using edge-aware prediction and transform are proposed in [8-10] to avoid the cross edge operations, where the edge masks are explicitly coded. In [10], shape adaptive wavelet with the supports of filters not across edges is designed to compress both color and depth with a shared edge mask being explicitly coded. Furthermore, descriptive approaches using linear model to approach depth plane is proposed in [11][12]. As we have known, all the above strategies do not support the object- based coding feature. In [13], object-based coding is applied on depth compression, where the depth block is modeled by linear functions. However, compress the color image by using the same codec is not suitable. An object-based coding system being suitable for both color and depth compression is expected. In this paper, we propose an object based coding system for Kinect-like depth and color compression in HEVC. It has advantages as follows. First, it achieves the consistent coding design for *This work was done when C. Lan was with Microsoft Research Asia as an intern. This work was supported by NSF of China (Nos.61033004, 61070138, 61072104, 61003148) .