Conditional Motion Vector Refinement for
Improved Prediction
Haricharan Lakshman
1
, Christian Rudat
1
, Matthias Albrecht
1
,
Heiko Schwarz
1
, Detlev Marpe
1
and Thomas Wiegand
1,2
1
Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute, Berlin, Germany.
2
Image Communication Chair, Technical University of Berlin, Germany.
Abstract— Adapting the resolution of motion compensated
prediction in a video codec is considered in this paper. A new
motion search and signaling scheme for increasing the accuracy
of prediction without affecting the complexity of encoding is
proposed. It involves using common information available to both
encoder and decoder, e.g. current slice type, texture in reference
pictures, etc as a cue to control motion vector accuracy and an
efficient reuse of motion information for predicting subsequent
blocks. An average bit rate reduction of around 2.5% for P-
pictures and 0.5% for B-pictures is observed with no extra search
compared to a fixed quarter-sample resolution.
I. I NTRODUCTION
Motion compensated prediction (MCP) using fractional-
sample resolution is commonly used in video coding. The
encoder transmits motion information, quantized prediction
residuals together with other side information in the bitstream.
In H.264/AVC [1], the motion vectors (MVs) have a quarter-
sample resolution. A theoretical analysis of the efficiency
of fractional sample filtering is provided in [2]. A structure
in which the reference samples are first passed through in-
terpolation filters to phase delay the signal followed by a
Wiener filter is considered. The interpolation filters provide
the required fractional shift while the Wiener filter smoothes
the prediction signal to counter the noise in signals and the
displacement estimation errors. However, in the current HEVC
draft [3], an adaptive loop filter and a sample adaptive offset
filter is introduced after the DPCM reconstruction stage, which
reduce noise in the reference pictures. Therefore, the role of
interpolation filtering is to mainly provide fractional shifts
without introducing much attenuation to the high frequency
components of the reference signals. In order to improve
the interpolation filtering, techniques like increasing the tap
length [5], adapting the filter coefficients [6], combined FIR
and IIR filtering [7], etc have been proposed. Mostly, interpo-
lation filters that induce the required phase delays for larger
passbands need additional complexity for implementation.
An alternative approach is to keep the filter length constant,
but to increase the number of available filters for MCP. Such
a scheme would not increase the complexity of a decoder
because only a single filter of the same length would be used
for predicting each hypothesis. The larger set of filters could
be used to provide more choice for the encoder in selecting
a phase delay during MCP. In this regard, one-eighth sample
resolution for MV has been explored in several works, e.g.
[8], [9]. However, the reported gains due to a one-eighth
sample motion resolution are limited and could even result
in a degradation of RD performance in some cases due to
the extra side information sent in the bitstream. Additionally,
one-eighth resolution also requires extra search on the encoder
side compared to the commonly used one-quarter resolution,
hence resulting in an increase of encoder complexity.
In this paper, we propose a new technique by modifying
the order of fractional sample motion estimation to achieve a
resolution higher than one-quarter sample without increasing
the complexity of motion estimation. A hierarchical motion
search is commonly used to accomplish fractional sample
search by first generating an integer sample MV, followed by
half and quarter-sample MVs. In such a scheme, we use the
half-sample search results to selectively search in a one-sixth
sample grid instead of a one-quarter grid. This helps to keep
the encoder complexity same as in the case of one-quarter sam-
ple search. Signaling a location in a full one-sixth grid however
needs more motion information. Therefore, we operate the
MV predictor in a regular quarter-sample resolution and first
transmit the motion vector difference (MVD) in the quarter-
sample resolution. Then we define a set of conditions using the
current slice type, texture in the referenced pictures, number of
hypothesis, etc, to control the transmission of MV refinement
information that can yield one-sixth sample resolution. The
new fractional positions are generated using FIR interpolation
filters as in the quarter-sample case. Therefore, the set of
possible amplitude responses and phase delays are increased,
giving more opportunity for an encoder to choose the best
operating point for coding a given block. The decoder follows
the same steps in order to determine whether the refinement
information exists for each MV, without any explicit signaling
of MV resolution in the bitstream. The refinement information
is further reused for subsequent prediction blocks having the
same MV by using the Merge mode in HEVC [3]. The MCP
for chroma is done synchronized to the luma MV, with the
required chroma downsampling. We compare the proposed
technique to a scheme which uses a fixed quarter-sample
grid and measure the performance gains. Finally, the test is
extended to include extra motion search and signaling to get an
estimate of RD performance improvements when consuming
more complexity at the encoder for the same architecture.
978-1-4577-2049-9/12/$26.00 ©2012 IEEE PCS 2012
May 7-9, 2012, Kraków, Poland
497
2012 Picture Coding Symposium