Conditional Motion Vector Reﬁnement for Improved Prediction Haricharan Lakshman 1 , Christian Rudat 1 , Matthias Albrecht 1 , Heiko Schwarz 1 , Detlev Marpe 1 and Thomas Wiegand 1,2 1 Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany. 2 Image Communication Chair, Technical University of Berlin, Germany. Abstract— Adapting the resolution of motion compensated prediction in a video codec is considered in this paper. A new motion search and signaling scheme for increasing the accuracy of prediction without affecting the complexity of encoding is proposed. It involves using common information available to both encoder and decoder, e.g. current slice type, texture in reference pictures, etc as a cue to control motion vector accuracy and an efﬁcient reuse of motion information for predicting subsequent blocks. An average bit rate reduction of around 2.5% for P- pictures and 0.5% for B-pictures is observed with no extra search compared to a ﬁxed quarter-sample resolution. I. I NTRODUCTION Motion compensated prediction (MCP) using fractional- sample resolution is commonly used in video coding. The encoder transmits motion information, quantized prediction residuals together with other side information in the bitstream. In H.264/AVC [1], the motion vectors (MVs) have a quarter- sample resolution. A theoretical analysis of the efﬁciency of fractional sample ﬁltering is provided in [2]. A structure in which the reference samples are ﬁrst passed through in- terpolation ﬁlters to phase delay the signal followed by a Wiener ﬁlter is considered. The interpolation ﬁlters provide the required fractional shift while the Wiener ﬁlter smoothes the prediction signal to counter the noise in signals and the displacement estimation errors. However, in the current HEVC draft [3], an adaptive loop ﬁlter and a sample adaptive offset ﬁlter is introduced after the DPCM reconstruction stage, which reduce noise in the reference pictures. Therefore, the role of interpolation ﬁltering is to mainly provide fractional shifts without introducing much attenuation to the high frequency components of the reference signals. In order to improve the interpolation ﬁltering, techniques like increasing the tap length [5], adapting the ﬁlter coefﬁcients [6], combined FIR and IIR ﬁltering [7], etc have been proposed. Mostly, interpo- lation ﬁlters that induce the required phase delays for larger passbands need additional complexity for implementation. An alternative approach is to keep the ﬁlter length constant, but to increase the number of available ﬁlters for MCP. Such a scheme would not increase the complexity of a decoder because only a single ﬁlter of the same length would be used for predicting each hypothesis. The larger set of ﬁlters could be used to provide more choice for the encoder in selecting a phase delay during MCP. In this regard, one-eighth sample resolution for MV has been explored in several works, e.g. [8], [9]. However, the reported gains due to a one-eighth sample motion resolution are limited and could even result in a degradation of RD performance in some cases due to the extra side information sent in the bitstream. Additionally, one-eighth resolution also requires extra search on the encoder side compared to the commonly used one-quarter resolution, hence resulting in an increase of encoder complexity. In this paper, we propose a new technique by modifying the order of fractional sample motion estimation to achieve a resolution higher than one-quarter sample without increasing the complexity of motion estimation. A hierarchical motion search is commonly used to accomplish fractional sample search by ﬁrst generating an integer sample MV, followed by half and quarter-sample MVs. In such a scheme, we use the half-sample search results to selectively search in a one-sixth sample grid instead of a one-quarter grid. This helps to keep the encoder complexity same as in the case of one-quarter sam- ple search. Signaling a location in a full one-sixth grid however needs more motion information. Therefore, we operate the MV predictor in a regular quarter-sample resolution and ﬁrst transmit the motion vector difference (MVD) in the quarter- sample resolution. Then we deﬁne a set of conditions using the current slice type, texture in the referenced pictures, number of hypothesis, etc, to control the transmission of MV reﬁnement information that can yield one-sixth sample resolution. The new fractional positions are generated using FIR interpolation ﬁlters as in the quarter-sample case. Therefore, the set of possible amplitude responses and phase delays are increased, giving more opportunity for an encoder to choose the best operating point for coding a given block. The decoder follows the same steps in order to determine whether the reﬁnement information exists for each MV, without any explicit signaling of MV resolution in the bitstream. The reﬁnement information is further reused for subsequent prediction blocks having the same MV by using the Merge mode in HEVC [3]. The MCP for chroma is done synchronized to the luma MV, with the required chroma downsampling. We compare the proposed technique to a scheme which uses a ﬁxed quarter-sample grid and measure the performance gains. Finally, the test is extended to include extra motion search and signaling to get an estimate of RD performance improvements when consuming more complexity at the encoder for the same architecture. 978-1-4577-2049-9/12/$26.00 ©2012 IEEE PCS 2012 May 7-9, 2012, Kraków, Poland 497 2012 Picture Coding Symposium