Motion Estimation using Dynamic Programming with Selective Path Search Minglun Gong Department of Math and Computer Science, Laurentian University, Sudbury, ON, Canada Abstract A novel dynamic programming-based motion estimation algorithm is presented in this paper. During matching cost calculation, the algorithm selectively keeps the cost values for the best N candidates only. The required memory space is therefore considerably reduced. In addition, a new path searching approach is applied. When searching for the optimal path from pixel p, the algorithm considers both the nearby candidates and the best N candidates at pixel p-1. As a result, better estimations can be produced around motion boundaries. The experimental results show that the motions estimated through considering all the candidates are only slightly better than those estimated through selectively considering a small number of candidates, even though the former approach requires significantly more computational time and memory space. Keywords: Motion estimation, Optical flow, Dynamic programming. 1 Introduction The accurate measurement of optical flow is important in many applications, such as structure-from-motion and video compression. Previous works in motion estimation are nicely surveyed by Barron et al. [2]. As they suggested, different techniques can be classified into four categories: gradient-based, matching-based, energy-based, and phase- based approaches. Even though the gradient-based approach is a more active research topic in recent years [1, 7], the matching-based approach does have its advantages: (1) it can be easily adapted to color video sequences; (2) it does not involve spatiotemporal derivative estimation, which is sensitive to noise and aliasing effects. Dynamic programming (DP) is an efficient technique for solving certain optimization problems. Several approaches have applied the DP technique to the motion estimation applications [3, 4, 6]. In Quenot’s approach [3, 4], the DP is performed alternatively on horizontal and vertical image strips. The spacing and width of the strips are gradually reduced to refine the matching results. In Sun’s approach [6], the DP optimization is conducted for different scanlines separately. The best path found is limited to pixel-level accuracy. Sub-pixel accuracy is obtained using an additional surface fitting process. When searching for the best path from pixel p, the above approaches only consider the nearby candidates. This ensures that the estimated optical flows satisfy the smoothness constraint. However, it will likely produce over-smoothed results, especially around the boundaries of fast moving objects. The main originality of the presented algorithm is to selectively consider other candidates during the best path search. That is, when we search for the optimal path from pixel p, the best N candidates at pixel p-1 are considered as well. The smoothness constraint is enforced through penalizing large velocity changes, rather than forbidding these changes. As a result, more accurate estimations can be produced. Besides the approaches mentioned above, our algorithm is also related with the scanline optimization technique, which is proposed for solving stereo vision problems [5]. In both approaches, the smoothness constraint is enforced using a discontinuity cost term. However, the most distinct difference between the two is that our approach only selectively considers some of the candidates, while scanline optimization considers all the candidates. As a result, our approach requires much less computation and memory space. 2 The presented algorithm 2.1 Matching cost calculation Being a matching-based approach, the presented algorithm needs a way to evaluate the matching cost of a given pixel (p,q) under a given velocity d. Here we can use different similarity/dissimilarity measures, such as zero mean normalized cross correlation (ZNCC), sum of squared differences (SSD), and sum of absolute differences (SAD). In the experiments shown in this paper, the SAD is used for its low computational cost. The corresponding equation is: ( ) ( ) ( ) ∑ ∑ − = ≤ ≤ − + + + + − + + = ′ f f k w j i w y x k k j q k i p I j q i p I q p C , 0 , , , , d d d where, w is the window radius in the space domain, f half of the window size in the time domain, and I k the image at frame k. In this paper, we set w=f=2, resulting in 5 frames and 5×5 windows. In addition, in order to approximate the effect of a robust measure, we truncate the calculated SAD values to a maximal value, C max . That is: ( ) ( ) ( ) ( )    ′ = > ′ = Otherwise q p C q p C C q p C if C q p C d d d d , , , , , , , , max max The SAD measure can be efficiently calculated using a box filter [6]. Basically, based on the horizontal and vertical velocity ranges of the motion sequence ([-z x ,+z x ] and [-z y ,+z y ]), we can quantize the continuous velocity space into L x and L y levels in the two directions. For each quantized velocity, the matching costs for all pixels in the image can be calculated in a single pass. The complexity is 0-7695-2128-2/04 $20.00 (C) 2004 IEEE