INTRA PREDICTION VERSUS WAVELETS AND LAPPED TRANSFORMS IN AN H.264/AVC CODER Rafael G. de Oliveira and Ricardo L. de Queiroz Departamento de Engenharia El´ etrica Universidade de Bras´ ılia Brazil rafael@image.unb.br queiroz@ieee.org ABSTRACT H.264/AVC is the latest video coding standard and, among other things, it uses a DCT-like transform and intra prediction modes. We are studying the possibility of replacing the modified DCT stages by lapped and wavelet transforms. Since those transforms have overlap, intra-frame prediction is not feasible, because of its block-recursive nature. Hence, intra-frame prediction is turned off. In essence, this paper contains a comparison among lapped (wavelet) transforms and linear prediction schemes, within the AVC framework. Results indicate that lapped transforms can outperform the intra prediction scheme, specially for high definition images. Index Terms— H.264/AVC, wavelets, intra prediction, lapped transforms, image coding 1. INTRODUCTION In 2004, the Joint Video Team adopted a new video cod- ing standard, the H.264/AVC[1]-[3]. It is now considered the state of the art in video compression, which was accom- plished by the adoption of a number of innovative features like: quarter pixel motion estimation precision, arbitrary ref- erence frame, variable size macroblock partition, in-loop de- blocking filter, intra-frame prediction, context-adaptive arith- metic coding (CABAC) and variable block size transforms (4×4 and 8×8). Even though H.264/AVC was meant to be a video cod- ing standard, when used in intra frame coding, it works as a formidable still image coder. Surprisingly, it outperforms the JPEG 2000 [4]-[6], considered the state of the art in im- age compression standards [7]. Most of the performance improvement may be attributed to the last three of the above cited features, combined with rate-distortion optimization (RDO). Since early development on lapped transforms [8], they are being compared to the DCT and wavelets. For example [9], there are many comparisons among transforms such as This work was supported by Conselho Nacional de Desenvolvimento Cientfico e Tecnolgico, CNPq, Brasil, under grant 474912/2006-0 the discrete cosine transform (DCT) [10], the 9/7-tap wavelet transform (WT) used in JPEG-2000 [7],[11], and lapped transforms (LT) such as the lapped orthogonal transform (LOT) [12], the generalized LOT (GenLOT) [13], and the generalized lapped biorthogonal transforms (GLBT) [14]. The results point to a small but important improvement achieved by using lapped transforms for all codecs tested. The comparisons were carried for JPEG [15] and SPIHT [16] coders using proper adaptations [17],[18]. Recently, WT vs. LT comparisons have also been carried in the context of JPEG-2000[19]. Despite all these comparisons, there is not much informa- tion in the literature available about the comparison between intra prediction and lapped transforms or wavelets. This is the focus of this work. Only gray scale images and intra-coded frames are considered here. 2. INTRA PREDICTION, LAPPED TRANSFORMS AND WAVELETS ASCOMPETITORS In block transforms like the DCT, only pixels inside the block are considered for the transform. That may be interesting for many reasons. However, it does not take advantage of correla- tion between neighbor blocks and may cause blocking effects, a discontinuity on the edge of the blocks due to quantizations errors. In H.264/AVC, this last effect is avoided by the use of a deblocking filter. In order to exploit the redundancy among neighbor blocks, H.264/AVC has intra-prediction modes that use pre- viously encoded pixels to predict the current block. Only the residue, that is, the difference between the predicted and the actual pixels, is encoded. The process is recursive. This hinders parallel processing. However, it makes possible to vary the size of the block. H.264/AVC has 9 modes of predic- tions for 4×4-pixel blocks, 9 for 8×8 blocks and 4 for 16×16 blocks. On the other hand, lapped transforms and wavelets al- ready have supports larger than the block being encoded. Their filters are non-recursive and have finite impulse re- sponse, which allows for parallel processing. In Fig. 1 it is