1752 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 15, NO. 9, DECEMBER 1997 Predictive RD Optimized Motion Estimation for Very Low Bit-Rate Video Coding Faouzi Kossentini, Member, IEEE, Yuen-Wen Lee, Mark J. T. Smith, Fellow, IEEE, and Rabab K. Ward Abstract— Predictive rate-distortion (RD) optimized motion estimation techniques are studied and developed for very low bit-rate video coding. Four types of predictors are studied: mean, weighted mean, median, and statistical mean. The weighted mean is obtained using conventional linear prediction techniques. The statistical mean is obtained using a finite-state machine modeling method based on dynamic vector quantization. By employing prediction, the motion vector search can then be constrained to a small area. The effective search area is reduced further by vary- ing its size based on the local statistics of the motion field, through using a Lagrangian as the search matching measure and imposing probabilistic models during the search process. The proposed motion estimation techniques are analyzed within a simple DCT- based video coding framework, where an RD criterion is used for alternating among three coding modes for each 8 8 block: motion only, motion-compensated prediction and DCT, and intra- DCT. Experimental results indicate that our techniques yield very good computation–performance tradeoffs. When such techniques are applied to an RD optimized H.263 framework at very low bit rates, the resulting H.263 compliant video coder is shown to outperform the H.263 TMN5 coder in terms of compression performance and computations simultaneously. Index Terms— DCT-based coding, low bit-rate video coding, motion estimation, predictive motion estimation, rate-distortion optimized motion estimation. I. INTRODUCTION A variety of motion estimation algorithms have been de- veloped for very low bit-rate video coding. However, the block-matching algorithm (BMA) [1] stands as the most pop- ular and the simplest in concept, design, and implementation. In fact, many new BMA-based video compression algorithms allow transmission or storage of QCIF resolution video with acceptable quality at bit rates as low as 16 kbit/s [2]–[5]. Most notable are the H.263-based video coders [6], which have recently been shown to outperform video coders using more complex object- and model-based motion estimation al- gorithms. A two-step BMA-based motion estimation algorithm is adopted in many H.263-based video coder implementations such as Telenor’s TMN5 [7]. The first step is an integer- pel accuracy full-search BMA (FS-BMA). The second step is aimed at improving the estimation accuracy, producing motion vector estimates with 1/2-pel accuracy. Manuscript submitted September 1, 1996; revised March 1, 1997. This work was supported by the Natural Sciences and Engineering Research Council of Canada under Grant OGP-0187668 and by NASA. This paper was presented in part at the 1997 International Conference on Image Processing. F. Kossentini, Y.-W. Lee, and R. K. Ward are with the Department of Electrical Engineering, University of British Columbia, Vancouver, BC, V6T 1Z4 Canada. M. J. T. Smith is with the School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0250 USA. Publisher Item Identifier S 0733-8716(97)07699-3. There are many problems associated with the above two- step motion estimation algorithm. The FS-BMA is well known for its large computation requirements, which has fueled many research activities. Such activities have produced more effi- cient algorithms [1], [3] such as log search, three-step search, cross search, conjugate gradient search, hierarchical search, and block subsampling. However, most of these algorithms may quickly get trapped in local minima, yielding a significant loss in motion vector estimation performance. Moreover, the FS-BMA performs poorly during intensity and reflectance illumination changes, nontranslational motion activities such as zoom and rotation, scene changes, and occlusions. This, coupled with the FS-BMA’s sensitivity to video input noise, produces a nonsmooth motion field that costs many precious bits in very low bit-rate video coding applications. Finally, producing motion vector estimates with 1/2-pel accuracy in- creases both the complexity and the bit rate, while yielding a relatively insignificant improvement in video quality. In this paper, we present predictive rate-distortion (RD) optimized motion estimation techniques employing several predictors and search methods. The proposed techniques re- duce substantially the number of computations, produce a smoother motion field, and yield better reproduction quality simultaneously. The techniques are analyzed and compared in the context of a simple DCT-based video coding framework, where only 8 8 blocks are used and only three coding modes (motion only, motion-compensated prediction and DCT, and intra-DCT) are allowed. An RD criterion, expressed by the Lagrangian , is used to alternate among the above coding modes. In very low bit-rate video coding applications such as video telephony, the motion field is very structured and slowly varying. Moreover, the motion vectors are usually limited in magnitude. This suggests that significant computation and coding gains can be achieved by taking advantage of the strong spatio-temporal dependencies that exist between the motion vectors. Memory has always been incorporated into the motion vector coding process, but very few researchers have suggested exploiting it to simplify motion estimation. Recently, predictive motion estimation [8]–[13] has become an important research area. This paper studies the complexity and performance of two linear predictors (mean and weighted mean) and two nonlinear predictors (median and statistical mean) when applied to motion estimation in the context of very low bit-rate video coding. Conventional linear predic- tion techniques are used to obtain the mean and weighted mean. The statistical mean is estimated using conditional probabilities obtained via a finite-state machine (FSM) that 0733–8716/97$10.00  1997 IEEE