A Hardware Solution for the HEVC Fractional Motion Estimation Interpolation Henrique Maich #1 , Vladimir Afonso #2 , Denis Franco #3 , Marcelo Porto #4 , Luciano Agostini #5 # Federal University of Pelotas (UFPel) Pelotas, Brazil 1 hdamaich@inf.ufpel.edu.br 2 vafonso@inf.ufpel.edu.br 3 denis.franco@ufpel.edu.br 4 porto@inf.ufpel.edu.br 5 agostini@inf.ufpel.edu.br AbstractNowadays many devices can handle with digital videos, especially with high definition, even for portable devices, as smartphones and tablets. However, high definition videos demand a high amount of information to be represented. The current video coding standards use a set of new techniques to increase its coding efficiency. One of these techniques, used by H.264 and HEVC (High Efficiency Video Coding), is the Fractional Motion Estimation (FME). One of the main steps of the FME is the interpolation step, which is responsible for the generation of the fractional positions. This paper presents a hardware design focusing on the interpolation step of the FME for the emerging HEVC standard. The designed architecture was described in VHDL and synthesized for Altera Stratix III FPGA. The architecture is able to generate the fractional samples for videos with QFHD (3840 x 2160 pixels) resolution in real time at 48 frames per second. KeywordsFME, Fractional Motion Estimation, FME HEVC, Interpolation HEVC, FME Interpolation I. INTRODUCTION Digital videos with high definition are used in many devices, like DVD and Blu-Ray players, digital TV and smartphones, among others. Since a great amount of data should be processed, it is extremely important to compress these videos. In addition, the real time processing of high definition videos is associated with a high computational complexity, principally due to these compression techniques needed. Considering portable device applications, this high complexity is a great restriction and a dedicated hardware design is an efficient solution for this problem. A video encoder uses different steps to reduce some types of the data redundancy presented in a video sequence. The inter-frame prediction identifies and reduces the temporal redundancy existing in near temporal frames of a video sequence. In the inter-frames prediction, there is a step called motion estimation (ME), which is the most important step in all video encoder since the ME is the one that brings more compression gains [1]. For the application of the ME, each video frame is subdivided in little blocks, called PUs (Prediction Units), before of coding. After, the ME step identifies in previously encoded frames (called reference frames), which PU are more like the PU that is being encoded. When this PU is identified on the reference frame, the ME generates a motion vector from the PU position allowing its localization. Therefore, it is necessary to calculate the differences from the PUs that are been encoded and the PUs in reference frames. Thus, only the vector and the difference between these PUs are sent to the next step of the encoder. In the HEVC (High Efficiency Video Coding), PUs can have variable sizes, and the selection of size occurs according the results of image quality and compression rates [2]. Furthermore, it is possible to apply a refinement step in the ME, called Fractional Motion Estimation (FME). The FME can increase ME gains using sub-pixels position beyond integer positions [3]. Finally, the HEVC uses some innovative techniques to compress the motion vectors [2] rising the compression gains. The FME step can be divided in: (a) the interpolation unit, responsible to generate fractional positions, and (b) the search for best match between the encoded block and previously blocks considering these positions generated. This paper presents the hardware design for the HEVC FME interpolation unit, considering 8x8 blocks. II. EVALUATION WITH THE REFERENCE SOFTWARE Since the HEVC inter-prediction complexity is very high, strategies to reduce this complexity are a relevant research topic, mainly for applications with processing and power consumption restrictions. But these strategies must maintain the quality and the compression rate in acceptable levels. The main part of the HEVC inter-frames computational complexity is related with the variable PU size, since each PU size must be evaluated in terms of the rate-distortion cost [2] to define which the best PU size is. There are 24 PU sizes defined in the current version of HEVC, and this number can rise to 25 when the 4x4 size is enabled [2]. Then, some evaluations were done in this work aiming to discover which