Optimized Hardware Implementation for Forward Quantization of H.264/AVC G. A. Ruiz & J. A. Michell Received: 2 June 2009 / Revised: 27 April 2010 / Accepted: 4 September 2012 # Springer Science+Business Media, LLC 2012 Abstract An efficient implementation for the computation of the forward quantization of H.264/AVC is presented. It uses a modified reformulation of quantization expressions, in full compliance with the standard, combined with an adaptive truncated Booth multiplier to reduce hardware complexity. The JM reference software’ s C code has been rewritten to analyze the effect of the proposed approach. Simulations carried out on several typical video sequences with different texture characteristics demonstrate the valid- ity of this approach with an improvement in the PSNR at low QP, between a maximum of +0.8 dB and a minimum of 0.3 dB, with a slight increment in the bit-rate of about 0.8 %. However, this improvement is smoothed for typical values of QP and only an insignificant difference is found with respect to the JM results. The proposed architecture synthe- sized in the AMS 0.35μm technology, which is suitable for VLSI implementation, reduces the area by 26 %, the power by 32 % and the critical path delay by 21 % in comparison with a classic implementation. Keywords Quantization . H.264/AVC . Truncated booth multiplier 1 Introduction The H.264/AVC (Advanced Video Codec) is the latest stan- dard for video coding established by the Joint Video Team ITU-T VCEG and ISO/IEC MPEG [1]. Compared with previous MPEG standards, H.264 provides over two times higher compression ratio with higher video coding quality. However, the computational complexity of H.264 video coding is much higher than that of the previous MPEG standards, as it requires real-time processing of H.264 in video coding through dedicated hardware designs. There- fore, a low-cost, low-power hardware implementation or high-quality H.264 video coding are emerging trends. H.264 has introduced a number of features that differ from the existing standards to support various applications. Figure 1 shows the block diagram of the H.264 encoding algorithm. The input video frames are captured in the intra or inter prediction part. Multiple reference frames and var- iable block size motion estimation are used for inter predic- tion. The best mode, among these prediction modes, is chosen in the mode selection block. The input block is subtracted from the prediction to form the residual block. This residual block is transformed by 4×4 integer DCT for luminance and 2×2 transform for chrominance DC coeffi- cients. Scan and quantization procedures are then applied to coefficients. This assumes a scalar forward quantizer per- formed at the encoder by a simple scale-and-shift formula which can be implemented directly in integer arithmetic. The step size of the quantizer is controlled with the use of a quantization parameter (QP) which supports 52 different values, from 0 to 51, in increments of one, and it also enables the encoder to control the trade-off between bit rate and quality. According to the notation in [2], each transform coefficient with value W ij (i,j 0 0 to 3) is quantized to the coefficient Z ij with the following equation: Z ij     ¼ W ij      MF ij þ F    qbits sign Z ij   ¼ sign W ij   ð1Þ where MF ij is the multiplication factor made up of 6×3 arrays of 14-bit positive integers, qbits 0 15+QP/6, ≫ G. A. Ruiz (*) : J. A. Michell Department of Electronics and Computers, Facultad de Ciencias, University of Cantabria, Avda. De los Castros s/n, Santander, Spain e-mail: ruizrg@unican.es J Sign Process Syst DOI 10.1007/s11265-012-0693-3