Transform-Domain Intra Prediction for H.264 Chen Chen and Ping-Hao Wu Graduate Institute of Communications Engineering, National Taiwan University, Taipei, Taiwan 10617, R.O.C. Email: {r92942081, b89901043}@ntu.edu.tw Homer Chen Department of Electrical Engineering, GICE, and INM. National Taiwan University, Taipei, Taiwan 10617, R.O.C. Email: homer@cc.ee.ntu.edu.tw Abstract—H.264/AVC is the newest video coding standard jointly developed by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. In contrast to some previous coding standards such as H.263+ and MPEG- 4 Part-2, where intra prediction is performed in the transform domain, the intra prediction of H.264 is completely defined in the pixel domain. This presents a challenge to multimedia systems in which transcoding is conducted in the transform domain for the purpose of computational efficiency. In this paper, we show how to obtain the transform domain predictions for various intra modes of H264. We begin by converting the intra prediction from the pixel domain to the transform domain through matrix manipulation. Then we show how the operations involved in the matrix manipulation can be simplified. A computational complexity analysis of each intra prediction mode of H.264 is provided. I. INTRODUCTION In view of the successful adoption of MPEG-2 technology in the DVD and DTV industries and the growing market potential of the newly developed H.264 standard, we are motivated to investigate the conversion of existing MPEG-2 multimedia content to the H.264 format. The overall architecture of our proposed scheme for MPEG-2 to H.264 transcoding is described in [6]. In this paper, we present the details of our transform-domain approach for H.264 intra prediction. H.264/AVC [1] is the newest video coding standard jointly developed by the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. It provides better compression efficiency than previous video coding standards. Transcoding is one of the key technologies for enabling interoperability between different systems and devices [1], [2]. The purpose of transcoding is to convert a multimedia signal from one format to another. The format conversion is characterized by bitrate, frame rate, coding standard, etc. In contrast to some previous coding standards such as H.263+ and MPEG-4 Part-2, where intra prediction is performed in the transform domain, the intra prediction of H.264 is completely defined in the pixel domain by referring to neighboring samples of the previously coded blocks. Transcoding in the transform (or compressed) domain has been a desired approach [3], [4] because it avoids complete decoding and re-encoding, which is computationally expensive. Therefore, there is a need to investigate transform-domain intra prediction for H.264. This paper is organized as follows. In Section II, we describe the proposed transform-domain intra prediction algorithm. The complexity of the algorithm is presented In Section III. Finally, a conclusion is given in Section IV. II. INTRA PREDICTION In H.264, there are a total of 9 prediction modes for a 4 4 × sub-block and 4 for a 16 16 × macroblock. If a sub- block or a macroblock is to be coded in intra mode, a prediction block is formed based on the neighboring samples of previously-coded blocks that are to the left and/or immediately above the block to be coded. The prediction block is then subtracted from the current block prior to encoding. In this section, the transform-domain intra prediction for luma samples is discussed. The transform-domain intra prediction of chroma samples can be easily derived in a similar way and is not described here. A. Intra 4 4 × The intra prediction of 4 4 × blocks defined by H.264 is illustrated in Fig. 1(a), where the symbols a to p denote the pixels of current block, and the symbols A to M denote the neighboring pixels based on which the prediction block is calculated. The direction of prediction is show in Fig. 1(b). Due to the page limit, we choose the first four modes of the 4 4 × intra prediction to illustrate our approach. 1) Vertical The prediction samples in this mode (mode 0) are obtained from the four neighboring pixels (A to D) above the block to be coded. Sample A is copied to every pixel in the first column of the block, sample B is copied to every pixel in the second column of the block, and so on. This work was supported in part by the National Science Council of Taiwan under contract NSC93-2752-E-002-006- PAE. (a) (b) Figure 1. Illustration of 4 4 × intra prediction 1497 0-7803-8834-8/05/$20.00 ©2005 IEEE.