A Wavelet Transform Image Sequence Coder Using Nonstat ionary D isplacement Estimation Mark R. Banharn. James C. Brailean and Aggelos K. Katsaggelos Department of Electrical Engineering and Computer Science McCormick School of Engineering and Applied Science Northwestern University Evanston, IL, 60208-3118, USA. Abstract In this paper, we present a novel coding technique which makes use of the nonstationary characteristics of an image sequence displacement field to estimate and encode motion information. In addition, we develop a wavelet transform approach using cross-scale vector quantization to encode single frames during periods of high motion and scene changes. The objective of this design is to demonstrate the coding potential of a newly developed motion estimator called the Compound Linearized MAP (CLMAP) estimator. This estimator can be used as a means for producing motion vectors which may be regenerated at the decoder with a coarsely quantized error term created in the encoder. The motion estimator generates highly accurate motion estimates. This permits the elimination of a separately coded displaced frame difference (DFD) and coded motion vectors. We exploit both the advantages of the nonstationary motion estimator and the edge preserving quality of the wavelet based still frame coder to improve the visual quality of reconstructed video- conferencing image sequences, at low hit rates. 1 Introduction Image sequence compression is generally considered a very effective tool in many areas of digital video because of its ability to exploit the temporal redundancies present between frames. Motion estimation is frequently used to aid in compressing sequences to very low bit rates (often as low as 0.1 or 0.2 hits/pixel for some applications such as videophone and video-conferencing). By describing the displacement of pixels from one frame to the next with a set of motion vectors, one can simply encode the vectors and a predicted frame error term which allows for the reconstruction of the original image in the decoder from the previous decoded frame. Inherent in any such scheme, however, is the need to compactly represent this set of motion vectors, or displacement vector field (DVF). We also must encode the prediction error, generally referred to as the displaced frame difference (DFD). Depending on the type of motion estimation performed in the encoder, there are a variety of techniques for handling the motion vector problem. Some of these include quantizing the vectors and transmitting them as overhead, or predicting the DVF from previous frames in the decoder. In this paper, we present a new method for solving the motion vector coding problem by exploiting the special structure of the recently developed Compound Linearized MAP (CLMAP) motion estimator [1]. The CLMAP estimator is based on a Vector Coupled Gauss-Markov (VCGM) model of the DVF. The VCGM model views the DVF as a two level random field. The upper level consists of the observable displacement vectors, while the lower level, called the line process, represents the structure of the DVF. This nonstationary motion estimator and the accompanying coder are introduced in Sec. 2. While the nonstationary DVF characteristics require special treatment of motion information, it is also important to note that still frames themselves have a very nonstationary structure. It has long been recognized that the encoding of digital images should consider rapidly and slowly varying edges in combination with smooth homogeneous regions within the image [2] . These components of the intensity field of the image are best understood and processed separately. We try to exploit this knowledge in our choice of an intra-frame coding technique. It is necessary for an image sequence coder to have an effective intra-frame compressor in order to ensure high quality displacements of the initial or "anchor" frame in any segment of a sequence. This terminology is derivative of the MPEG coding standard which includes provisions for anchor frames as well as "interpolated" or 'hi-directional" , and "predicted" frames [3] . Each of these types of frames will also play a part in our coder, as we will describe later. Given that an emerging technique for still image compression, the wavelet transform, has been fairly well investigated under the guise of both subband coding [4] and multiresolution analysis [5, 6], we exploit some of the more recent inves- tigations in this area for our coder. The 2-D wavelet transform, perhaps best described in the context of multiresolution 210 ISPIE Vol. 1818 Visual Communications and Image Processing '92 0-8194-1018-7/92/$4.00