Third-Eye Stereo Analysis Evaluation Enhanced by Data Measures Ver´ onica Suaste 1 , Diego Caudillo 1 , Bok-Suk Shin 2 , and Reinhard Klette 2 1 CIMAT and the University of Guanajuato, Mexico 2 The .enpeda.. Project, The University of Auckland, New Zealand Abstract. Third-eye stereo analysis evaluation compares a virtual im- age, derived from results obtained by binocular stereo analysis, with a recorded image at the same pose. This technique is applied for evaluating stereo matchers on long (or continuous) stereo input sequences where no ground truth is available. The paper provides a critical and constructive discussion of this method. The paper also introduces data measures on input video sequences as an additional tool for analyzing issues of stereo matchers occurring for particular scenarios. The paper also reports on extensive experiments using two top-rated stereo matchers. 1 Introduction Modern applications of stereo analysis require that stereo matchers work accu- rately on long or continuous binocular input video data. For example, in vision- based driver assistance, those data are recorded for any possible traﬃc scenario [9]. Robust matchers need to work accurately for various scenarios. In general it is expected that there is no single best matcher; an adaptive selection of a matcher (within a given ‘toolbox’) appears to be a possible solution. The third-eye method of [11] provides stereo analysis performance evaluation for long or continuously recorded stereo sequences. For a current application of this method, see [12]. We provide in this paper a critical and constructive discussion of this method, pointing to weaknesses and also outlining ways how to overcome those. Video data measures are used to discuss solutions and to propose ways for a detailed analysis of situations where a stereo matcher fails (and should be improved accordingly), extending our initial discussion of data measures in [10]. For testing, the eight long trinocular stereo sequences of Set 9 on EISATS [4] have been used (each 400 stereo frames long, except the ‘People’ sequence which is only 234 frames long); see Fig. 1. The tested stereo matchers are iterative semi- global matching (iSGM) [7] and linear belief propagation (linBP) [10]. Both apply the census transform as the data cost function, and linBP uses a truncated linear smoothness constraint [5]. Both stereo matchers, iSGM and linBP, rank high on the KITTI stereo benchmark suite (www.cvlibs.net/datasets/kitti/). The paper is structured as follows: Section 2 provides used notations and deﬁnitions. Section 3 illustrates interesting cases when using the third-eye ap- proach. Section 4 discusses the use of data measures for solving critical cases and for discussing stereo performance more in detail. Section 5 concludes.