MOTION-COMPENSATION USING VARIABLE-SIZE BLOCK-MATCHING WITH BINARY PARTITION TREES Marc Servais and Theo Vlachos CVSSP, University of Surrey Guildford, GU2 7XH, United Kingdom m.servais@surrey.ac.uk Thomas Davies BBC Research and Development Kingswood Warren, Tadworth, Surrey, KT20 6NP, United Kingdom ABSTRACT A new approach to Variable Size Block Matching is pro- posed, based on the binary partitioning of blocks. If a par- ticular block does not allow for accurate motion compensa- tion, then it is split into two using the horizontal or vertical line that achieves the maximum reduction in motion com- pensation error. This method causes partitioning to occur along motion boundaries, thus substantially reducing block- ing artifacts. In addition, small blocks are placed in regions of complex motion, while large blocks cover regions of uni- form motion. The proposed technique provides significant gains in picture quality of 1.5 to 3.0 dB, when compared to Fixed Size Block Matching at the same total rate. 1. INTRODUCTION Motion compensation techniques are an important part of almost all video codecs since they provide an effective way of exploiting the temporal redundancy between frames in an image sequence. Traditionally, Fixed Size Block Matching (FSBM) has been used to determine the motion of a block in the current frame relative to the reference frame(s) [1]. In FSBM, the blocks are a fixed size (e.g. 16 × 16) and are laid out in a regular grid spanning the frame. One side effect of FSBM is the introduction of blocking artifacts. These occur when a block covers an area where two or more types of motion are present, in which case it is not possible to repre- sent the motion with just one motion vector. This can result in a large motion compensation error within such a block, as well as causing sharp spatial edges between neighbouring blocks in the motion compensated picture. Chan et al [2] introduced Variable Size Block Match- ing (VSBM), which allows for small blocks to cover areas of complex motion, while regions of uniform motion are spanned by large blocks. Their method partitions the frame into variable size blocks using a binary tree approach. Sulli- van and Baker [3] proposed a related VSBM method which This work has been funded by BBC Research and Development and the Centre for Vision Speech and Signal Processing, University of Surrey. employs Lagrangian optimisation. This produces an opti- mal quad-tree, which minimises the distortion for a given bit-rate. Rhee et al also used a VSBM approach to find the quad-tree that achieves the minimum error for a given num- ber of blocks [4]. Note that in each of these methods, a block is divided into either halves or quarters of equal area. The H.263 and MPEG-4 coding standards were the first to allow variable size blocks (16 × 16 or 8 × 8). In addition, H.264/AVC provides several different macroblock partition- ing modes, ranging from 16 × 16 to 4 × 4 blocks. This paper describes a VSBM approach which allows the binary splitting of blocks. However, blocks are not nec- essarily split into two equal halves. Instead, blocks are split using the horizontal or vertical line that achieves the maxi- mum reduction in motion compensation error. This allows for partitioning a scene along motion boundaries, which en- ables effective motion compensation using relatively few blocks. As a result, significant gains in rate-distortion per- formance are possible. 2. MOTION COMPENSATION ERROR SURFACES In traditional block matching, the goal is to minimise the error between a block in the current frame and a displaced block in the reference frame. The error is usually measured in terms of either the sum of absolute error (SAE) or the sum of squared error (SSE). In effect, motion estimation amounts to finding the location of the minimum value on the error surface. Because they will prove useful later (in Sections 3 and 4), the process of generating motion com- pensation error surfaces is discussed below in more detail. For each block, an error surface E b,f (u, v) is calculated, i.e. the SSE when block number b is motion compensated by translating the corresponding block in reference frame number f a distance (u, v). For all points (x, y) in block b, the SSE between the current frame, I , and a reference frame, I f , is calculated according to the equation: E b,f (u, v)= (x,y) ∈ Block b [I (x, y) − I f (x + u, y + v)] 2