3D VIDEO CODING USING REDUNDANT-WAVELET MULTIHYPOTHESIS AND MOTION-COMPENSATED TEMPORAL FILTERING Yonghui Wang, Suxia Cui, and James E. Fowler Department of Electrical and Computer Engineering Engineering Research Center Mississippi State University, Mississippi State, MS ABSTRACT A video coder is presented that combines mesh-based motion-com- pensated temporal filtering, phase-diversity multihypothesis mo- tion compensation, and an embedded 3D wavelet-coefficient coder. The key contribution of this work is the introduction of the phase- diversity multihypothesis paradigm into motion-compensated tem- poral filtering, which is achieved by deploying temporal filtering in the domain of a spatially redundant wavelet transform. A regu- lar triangle mesh is used to track motion between frames, and an affine transform between mesh triangles implements motion com- pensation within a lifting-based temporal transform. Experimental results reveal that the incorporation of phase-diversity multihy- pothesis into mesh-based motion-compensated temporal filtering significantly improves the rate-distortion performance of the 3D video coder. 1. INTRODUCTION It has been generally recognized that the goal of highly scalable video representation is fundamentally at odds with the traditional motion-estimation/motion-compensation (ME/MC) feedback loop which hinders the achieving of a high degree of resolution, tem- poral, and fidelity scalability. Consequently, the use of 3D trans- forms, which break the ME/MC feedback loop, are becoming the preferred approach to full scalability, and a number of modern 2D still-image algorithms have been straightforwardly extended to the third dimension (e.g., 3D-SPIHT [1]) by employing separable 3D wavelet transforms. This approach usually involves a wavelet- packet subband decomposition wherein a group of frames is pro- cessed with a temporal transform followed by spatial decomposi- tion of each frame. However, without MC, temporal transforms produce low-quality temporal subbands with significant “ghost- ing” artifacts [2] and decreased coding efficiency. Consequently, there has been significant interest in motion-compensated tempo- ral filtering (MCTF) [2–9] in which it is attempted to have the temporal transform follow motion trajectories. In this paper, we describe a 3D video coder using a 3D wavelet transform with MCTF. The salient aspect of this coder lies in that we employ multihypothesis motion compensation (MHMC) within the MCTF to combat the uncertainty inherent in estimating motion trajectories for MCTF, thereby achieving rate-distortion perfor- mance significantly superior to the usual single-hypothesis MCTF approach. Although multihypothesis has been used in conjunc- tion with MCTF before (e.g., [8] and [9] propose both spatially and temporally diverse multihypothesis MCTF predictions), in our proposed system, we employ a new class of MHMC—phase-diver- sity multihypothesis [10]. Specifically, phase-diversity MHMC is implemented by deploying MCTF in the domain of a spatially re- dundant wavelet transform such that multiple hypothesis temporal filterings are combined implicitly in the form of an inverse trans- form. In essence, we combine the redundant-wavelet-multihypo- thesis (RWMH) paradigm we introduced in [10] with the 3D MCTF architecture emerging as the preferred approach to fully scalable video coding. 2. REDUNDANT-WAVELET MULTIHYPOTHESIS MHMC [11] forms a prediction of pixel s(x, y) in the current frame as a combination of multiple predictions in an effort to com- bat the uncertainty inherent in the ME process. Assuming that the combination of these hypothesis predictions is linear, we have that the prediction of s(x, y) is ˜ s(x, y)= i wi (x, ysi (x, y), (1) where the multiple predictions ˜ si (x, y) are combined according to some weights wi (x, y). A number of multihypothesis techniques for MC have been proposed in the past including fractional-pixel- accurate MC, B-frames, overlapped block MC, and multiple ref- erence frames. These techniques employ multiple predictions that are diverse spatially or temporally to improve the overall predic- tive ability of the system. In [10], we introduced a new class of MHMC, phase-diversity MHMC, in which the multihypothesis- prediction concept in extended into the transform domain. Specif- ically, we performed ME and MC in the domain of a redundant, or overcomplete, wavelet transform, and used multiple predictions that were diverse in transform phase. The redundant discrete wavelet transform (RDWT) is an ap- proximation to the continuous wavelet transform that, in essence, removes the downsampling operator from the traditional critically sampled transform to produce an overcomplete representation. As illustrated in Fig. 1, the size of each subband of an RDWT is the same as that of the input signal. Additionally, a J -scale RDWT can be considered to be composed of 4 J distinct critically sam- pled transforms, each corresponding to the choice between even- and odd-phase subsampling in both the horizontal and vertical di- rections at each scale of decomposition. In the RWMH paradigm outlined in [10], each one of these critically sampled transforms “views” motion from a different perspective and thus forms an in- dependent hypothesis of the true motion of the video sequence. The inverse RDWT combines these multiple hypotheses into a sin- gle prediction. In the system of [10], this prediction is incorporated into the MC feedback loop of a hybrid video-coding architecture employing block-based ME/MC. Below, we introduce the RWMH concept into the MCTF framework to eliminate the MC feedback loop and produce a 3D video coder. In Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain, September 2003, vol. 2, pp. 755-758.