ADAPTIVE M -BAND HIERARCHICAL FILTERBANK FOR COMPLIANT TEMPORAL SCALABILITY IN H.264 STANDARD C. Bergeron, C. Lamy-Bergot * THALES Land & Joint Systems EDS/SPM Department F-92704 Colombes B. Pesquet-Popescu † ENST TSI Department F-75013 Paris ABSTRACT This paper presents a solution of temporal scalability for video encoded H.264/MPEG-4 AVC bitstreams. Achieved through the concept of adaptive M -band hierarchical filter- banks, the temporal scalability is performed thanks to the application of a frame shuffling operation which allows to keep backward compatibility with the standard. Simulation results show that this scalability is obtained with no degra- dation in terms of subjective and objective quality. 1. INTRODUCTION Following the ever increasing demand for efficient, simple and easily applicable video coding standard that could be applied to settings as different as wired and wireless com- munications, ITU-T and ISO have established a common specification, denoted H.264 or MPEG-4 AVC [1], which provides a significant compression gain when compared to former standards and is easily adaptable to networked appli- cations. Targeting applications as diverse as visiophony over wired or wireless links, high quality video services for strea- ming over satellite or lower quality streaming for video ser- vices over the Internet, H.264 presents one major drawback when channel varying applications are concerned : it does not include scalability. Solutions are currently being propo- sed in the literature or within SVC standardisation group to remedy to this problem, which generally plan to modify the H.264 syntax to integrate PFGS (Progressive fine granular scalability) coding or subband decompositions [2, 3]. In the meantime, motion-compensated (MC) spatio-temporal subband decompositions have gained a lot of interest due to their fine granular spatial/temporal/SNR scalability features combined with state-of-the-art compression performance[4]. In particular, the temporal scalability in these codecs is achie- ved through multi-resolution dyadic (and even triadic [5]) * This work was partially supported by the European Community through project IST-FP6-001812 PHOENIX. † This work was partially supported by the European Community through the project IST-FP6-1-507113 DANAE filterbanks. However, these structures are open-loop and some of the powerful tools in H.264/AVC like the in-loop deblo- cking filter are not easy to apply. In this paper, we present some temporal scalable solu- tions fully compliant with H.264/AVC and show that they can be easily interpreted and generalized in the framework of adaptive M -band hierarchical filterbanks. They combine a hierarchical representation with a closed-loop structure and preserve (or even improve) the coding performance of the original non scalable scheme. The paper is organised as follows. Section 2 introduces proposed hierarchical filterbank structures and discusses their interest for video coding and scalability. An application of such filterbanks is proposed and discussed in Section 3. Sec- tion 4 describes a practical setup for easily applying filtering in a compliant way to an H.264 codec, through the applica- tion of an interleaver. Finally, experimental results are pre- sented in Section 5 and conclusions are drawn in Section 6. 2. M -BAND HIERARCHICAL FILTERBANKS We propose a generic filterbank structure that provides a hierarchy of output subbands, containing one “intra” (equi- valent to a low-pass) subband and several levels of detail subbands, ordered according to their importance. Each de- tail subband is obtained through a closed prediction loop, which is different from existing temporal wavelet schemes. Fig. 1 illustrates the proposed concept for a scalability of factor 2 and two detail levels, but it is easy to generalize this construction to other sub-sampling factors. The Group of Pictures (GOP) size is M =2 L − 1, where L is the num- ber of temporal resolution levels (note again the difference with a wavelet filterbank GOP size and structure). In Fig. 1, D 1 , resp. D 2 denote delays that can be cho- sen such as to design different GOP encoding orders. For example, for D 1 = Z -2 L-1 , D 2 = Z 2 L-1 , we get a symme- trical encoding structure, with an Intra frame in the middle of the GOP (see also Fig. 2, while for D 1 = Z 2 L-1 , D 2 = Z 2(2 L-1 ) the Intra frame is encoded at the beginning of