Augmenting analytic SFM filters with frame-to-frame features q Adel Fakih a, , Daniel Asmar b,1 , John Zelek a,2 a Department of Systems Design Engineering, University of Waterloo, 200 University Avenue West, Waterloo, Ontario N2L 3G1, Canada b American University of Beirut, P.O. Box 110236/(Mechanical Engineering), Riad ElSolh/Beirut 1107 2020, Lebanon article info Article history: Received 25 May 2013 Accepted 28 August 2014 Available online 16 September 2014 Keywords: Structure from motion Filtering Complexity reduction Frame-to-frame features abstract In Structure From Motion (SFM), image features are matched in either an extended number of frames or only in pairs of consecutive frames. Traditionally, SFM filters have been applied using only one of the two matching paradigms, with the Long Range (LR) feature technique being more popular because of the fact that features that are matched across multiple frames provide stronger constraints on structure and motion. Nevertheless, Frame-to-Frame (F2F) features possess the desirable property of being abundant because of the large similarity that exists between closely spaced frames. Although the use of such features has been limited mostly to the determination of inter-frame camera motion, we argue that significant improvements can be attained in online filter-based SFM by integrating the F2F features into filters that use LR features. The main contributions of this paper are twofold. First, it presents a new method that enables the incorporation of F2F information in any analytical filter in a fashion that requires minimal change to the existing filter. Our results show that by doing so, large increases in accuracy are achieved in both the structure and motion estimates. Second, thanks to mathematical simplifications we realize in the filter, we minimize the computational burden of F2F integration by two orders of magni- tude, thereby enabling its real-time implementation. Experimental results on real and simulated data prove the success of the proposed approach. Ó 2014 Elsevier Inc. All rights reserved. 1. Introduction Two categories of image features can be used in recovering the 3D motion and/or scene structure from video images (Fig. 1): 1. Long Range (LR) features which are tracked over an extended number of frames: this type of feature introduces 3D-to-2D constraints linking the scene structure and the 2D motion to the projection of the features in the images. These constraints allow the recovery of both the 3D motion and the scene struc- ture and most of the approaches use this category of features [4,20,21,14,3,24]. 2. Frame-to-Frame (F2F) features that are matched between only each two consecutive frames: this type of feature is generally not robust enough to be used for the estimation of the structure of the scene because those features provide, for a given 3D point, only two image projections in two spatially close frames. It is for this reason that such features have traditionally been used mostly to pose constraints on the motion between the two corresponding frames. Such constraints constitute a mea- surement of the differential motion (velocity or incremental change of motion) in contrast to LR features that provide an ‘‘absolute’’ measurement of the motion and structure. The reli- ability of this type of differential measurement stems from the large number of F2F features that can be matched between con- secutive frames. F2F features have been used in some analytic recursive filters for the purpose of motion estimation such as the essential filter of Soatto et al. [27] using the epipolar constraint as a measurement equation. Soatto and Perona also introduced the subspace filter [26] using the subspace method of Jepson and Heeger [10] based on optical flow as a measurement mechanism. However, such approaches suffer from two major limitations. First, the translation magnitude between different frames cannot be estimated relative to a common gauge, and hence the obtained estimates cannot be integrated together in order to determine the absolute motion. Sec- ond, only the motion can be estimated reliably. Furthermore, such filters have cubic computational complexity in terms of the F2F features and hence are not able to run in real-time with a large number of features. We postulate here that F2F information is http://dx.doi.org/10.1016/j.cviu.2014.08.003 1077-3142/Ó 2014 Elsevier Inc. All rights reserved. q This paper has been recommended for acceptance by Sven Dickinson. Corresponding author. E-mail addresses: afakih@uwaterloo.ca (A. Fakih), dasmar@aub.edu.lb (D. Asmar), jzelek@uwaterloo.ca (J. Zelek). 1 Office: Raymond Ghosn 410, American University of Beirut, Beirut, Lebanon. 2 Office: DWE 2513G, University of Waterloo, Waterloo, Canada. Fax: +1 519 746 4791. Computer Vision and Image Understanding 129 (2014) 1–14 Contents lists available at ScienceDirect Computer Vision and Image Understanding journal homepage: www.elsevier.com/locate/cviu