Motion compensation based on tangent distance prediction for video compression Jonathan Fabrizio a,n , Se ´ verine Dubuisson b , Dominique Be ´ re ´ ziat b a LRDE-EPITA, 14-16, rue Voltaire, F-94276 Le Kremlin Bicˆ etre Cedex, France b Universite´ Pierre et Marie Curie, Laboratoire d’Informatique de Paris 6, 4 place Jussieu, Paris 75252 Cedex, France article info Article history: Received 17 December 2010 Accepted 3 December 2011 Available online 16 December 2011 Keywords: Video compression Motion compensation Tangent distance Theora abstract We present a new algorithm for motion compensation that uses a motion estimation method based on tangent distance. The method is compared with a Block-Matching based approach in various common situations. Whereas Block-Matching algorithms usually only predict positions of blocks over time, our method also predicts the evolution of pixels into these blocks. The prediction error is then drastically decreased. The method is implemented into the Theora codec proving that this algorithm improves the video codec performances. & 2011 Elsevier B.V. All rights reserved. 1. Introduction Video compression refers to reducing the quantity of data used to represent video images. A video is a sequence of frames (or images) that are related along the temporal and spatial dimensions: two consecutive frames might be similar, and the only observed changes are supposed to be due either to the displacements of objects or the camera, to the changes of illumination, or to the noise. In order to reduce the amount of data in image sequences to be transmitted, it is then necessary to determine the spatio- temporal redundancies and to exploit it by defining predictable properties. Considering the data to encode, these properties are used to make predictions, and only the errors between original and predicted data are sent. This technique by itself does not reduce the amount of data (for video compression, we transmit an image of errors, that contains as many pixels as the original image), but, combined with a statistical entropy coding, reduces the data size. In fact, these errors have smaller dynamics than the original pixel values, giving a smaller entropy, resulting in a diminution of the number of bits necessary to encode data. We distinguish two main prediction types: the tem- poral prediction and the spatial one. Spatial, or intraframe prediction, only uses the current frame information: pixels of the frame buffer, sorted into their raster order, are supposed to be similar. By only considering pixels previously examined (and thus already coded) in a specific neighborhood, the coder predicts the value of the current pixel. The main difficulty in such approaches is the choice of weighting coefficients for the pixels in the neighborhood. Usually, spatial prediction is adapted to the image content (edges, area, etc.). Among the large number of existing spatial predictors, a well known one is the median adaptive predictor (MAP) [24]. MAP selectively uses three linear predictors based on a simple function of surrounding values and gives a good prediction even in the presence of edge features. This predictor has been embedded into the LOCO-I algorithm [37]. Spatial prediction is used in numerous codecs in the spatial domain (H.264/AVC [38,30]) or in the frequency domain (Theora Codec [8]). Temporal prediction, also called interframe prediction, uses earlier or/and later frames in the sequence to predict Contents lists available at SciVerse ScienceDirect journal homepage: www.elsevier.com/locate/image Signal Processing: Image Communication 0923-5965/$ - see front matter & 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.image.2011.12.001 n Corresponding author. Tel.: þ33 1 53 14 59 40; fax: þ33 1 53 14 59 13. E-mail address: jonathan.fabrizio@lrde.epita.fr (J. Fabrizio). Signal Processing: Image Communication 27 (2012) 153–171