BUFFER REQUIREMENT ANALYSES FOR MULTIVIEW VIDEO CODING Ying Chen 1 , Ye-Kui Wang 2 , Moncef Gabbouj 1 1 Institute of Signal Processing, Tampere University of Technology 2 Nokia Research Center ABSTRACT Multiview video coding (MVC), which is becoming an extension of H.264/AVC, is currently under development by the Joint Video Team (JVT). Compared to H.264/AVC, the main new compression tool in MVC is inter-view prediction, which, among others, causes a substantial increase of the decoded picture buffer (DPB) size. Therefore to have an efficient buffer management for MVC is highly desirable. In this paper, we provide analyses of minimum buffer requirements for typical MVC coding structure with two coding methods, view- first coding and time-first coding. The analysis results are helpful in designing reference picture management or reference picture marking methods. Index Terms— Multiview video coding, decoded picture buffer, reference picture marking, H.264/AVC 1. INTRODUCTION Multiview video technologies have gained significant interest recently. Two typical applications are free- viewpoint video and 3D TV. In free-viewpoint video, the viewer can interactively choose his/her viewpoint in 3-D space to observe a real-world scene from preferred perspectives [1]. In 3D TV, there are different stereoscopic views generated by the video captured by multiple cameras of the scene. Comparing to free- viewpoint video, 3D TV does not require interaction [2]. On the other hand, 3D TV usually requires displaying of all the views; while free-viewpoint TV displays only one view. Due to the huge amount of data, particularly when the number of views to decode is large, the transmission part of the system for multiview video applications relies heavily on the compression of the video captured by cameras. Simulcast coding can be employed to multiview video coding using one video coder, e.g., H.264/AVC standard [3] for each view separately. However, exploiting of the correlation for further improved compression efficiency is of great interest. Fortunately, inter-view prediction is supported in the latest draft specification of multi-view coding extension of H.264/AVC (MVC), which was decided by MPEG to be a start point for MVC after the subjective assessment among several other codecs. The latest draft of the video model of MVC is described in JMVM [4]. In MVC, a picture can use pictures of different views with the same time instance for inter-view prediction reference. For each view, the information of which views may be used for inter-view prediction reference is included in the extension to the sequence parameter set (SPS). This information stays unchanged throughout a coded video sequence associated with the SPS. Temporal scalability supported by H.264/AVC [3], is inherited in MVC. The most typical coding structure for temporal scalability is the hierarchical B picture coding structure [5]. Typically, the hierarchical structure requires larger DPB size compared to the simple structures such as IPPP and IBBP. The two main H.264/AVC DPB management tools, reference picture list reordering (RPLR) and memory management control operation (MMCO) commands, are typically utilized in hierarchical B coding. In H.264/AVC, reference pictures are marked as short-term or long-term pictures. There are two types of operations for the reference picture marking: adaptive memory control and sliding window. Different reference picture marking operations can be applied to each picture independently. The adaptive memory control can explicitly mark a short-term or long-term picture as “unused for reference”, while the sliding window operation is a first-in-first-out mechanism among short- term reference pictures. Because in MVC more than one view is encoded and inter-view prediction is employed, the required DPB size for decoding an MVC bitstream could be very large, as can be seen from the buffer requirement analyses presented later in this paper. Therefore, to design an optimal buffer management for pictures both used for prediction reference and waiting for output with the considerations of coding order, temporal scalability and view scalability is crucial for the memory resource control. In this paper, we present DPB analyses for minimum buffer requirements for the most typical coding structure included in JMVM [4] with two different coding methods, time-first coding and view-first coding. The prediction structure is represented as a binary tree to ease the analyses. The analysis results are helpful in designing reference picture management method, in particularly, reference picture marking method for multiview video coding. For example, the authors have utilized the results in their MVC reference picture marking proposal [6], which has been adopted into the Joint Draft of MVC [7].