Conditional Motion Vector Refinement for Improved Prediction Haricharan Lakshman 1 , Christian Rudat 1 , Matthias Albrecht 1 , Heiko Schwarz 1 , Detlev Marpe 1 and Thomas Wiegand 1,2 1 Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. 2 Image Communication Chair, Technical University of Berlin, Germany. Abstract— Adapting the resolution of motion compensated prediction in a video codec is considered in this paper. A new motion search and signaling scheme for increasing the accuracy of prediction without affecting the complexity of encoding is proposed. It involves using common information available to both encoder and decoder, e.g. current slice type, texture in reference pictures, etc as a cue to control motion vector accuracy and an efficient reuse of motion information for predicting subsequent blocks. An average bit rate reduction of around 2.5% for P- pictures and 0.5% for B-pictures is observed with no extra search compared to a fixed quarter-sample resolution. I. I NTRODUCTION Motion compensated prediction (MCP) using fractional- sample resolution is commonly used in video coding. The encoder transmits motion information, quantized prediction residuals together with other side information in the bitstream. In H.264/AVC [1], the motion vectors (MVs) have a quarter- sample resolution. A theoretical analysis of the efficiency of fractional sample filtering is provided in [2]. A structure in which the reference samples are first passed through in- terpolation filters to phase delay the signal followed by a Wiener filter is considered. The interpolation filters provide the required fractional shift while the Wiener filter smoothes the prediction signal to counter the noise in signals and the displacement estimation errors. However, in the current HEVC draft [3], an adaptive loop filter and a sample adaptive offset filter is introduced after the DPCM reconstruction stage, which reduce noise in the reference pictures. Therefore, the role of interpolation filtering is to mainly provide fractional shifts without introducing much attenuation to the high frequency components of the reference signals. In order to improve the interpolation filtering, techniques like increasing the tap length [5], adapting the filter coefficients [6], combined FIR and IIR filtering [7], etc have been proposed. Mostly, interpo- lation filters that induce the required phase delays for larger passbands need additional complexity for implementation. An alternative approach is to keep the filter length constant, but to increase the number of available filters for MCP. Such a scheme would not increase the complexity of a decoder because only a single filter of the same length would be used for predicting each hypothesis. The larger set of filters could be used to provide more choice for the encoder in selecting a phase delay during MCP. In this regard, one-eighth sample resolution for MV has been explored in several works, e.g. [8], [9]. However, the reported gains due to a one-eighth sample motion resolution are limited and could even result in a degradation of RD performance in some cases due to the extra side information sent in the bitstream. Additionally, one-eighth resolution also requires extra search on the encoder side compared to the commonly used one-quarter resolution, hence resulting in an increase of encoder complexity. In this paper, we propose a new technique by modifying the order of fractional sample motion estimation to achieve a resolution higher than one-quarter sample without increasing the complexity of motion estimation. A hierarchical motion search is commonly used to accomplish fractional sample search by first generating an integer sample MV, followed by half and quarter-sample MVs. In such a scheme, we use the half-sample search results to selectively search in a one-sixth sample grid instead of a one-quarter grid. This helps to keep the encoder complexity same as in the case of one-quarter sam- ple search. Signaling a location in a full one-sixth grid however needs more motion information. Therefore, we operate the MV predictor in a regular quarter-sample resolution and first transmit the motion vector difference (MVD) in the quarter- sample resolution. Then we define a set of conditions using the current slice type, texture in the referenced pictures, number of hypothesis, etc, to control the transmission of MV refinement information that can yield one-sixth sample resolution. The new fractional positions are generated using FIR interpolation filters as in the quarter-sample case. Therefore, the set of possible amplitude responses and phase delays are increased, giving more opportunity for an encoder to choose the best operating point for coding a given block. The decoder follows the same steps in order to determine whether the refinement information exists for each MV, without any explicit signaling of MV resolution in the bitstream. The refinement information is further reused for subsequent prediction blocks having the same MV by using the Merge mode in HEVC [3]. The MCP for chroma is done synchronized to the luma MV, with the required chroma downsampling. We compare the proposed technique to a scheme which uses a fixed quarter-sample grid and measure the performance gains. Finally, the test is extended to include extra motion search and signaling to get an estimate of RD performance improvements when consuming more complexity at the encoder for the same architecture. 978-1-4577-2049-9/12/$26.00 ©2012 IEEE PCS 2012 May 7-9, 2012, Kraków, Poland 497 2012 Picture Coding Symposium