1646 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 4, AUGUST 2004 On Combining Support Vector Machines and Simulated Annealing in Stereovision Matching Gonzalo Pajares and Jesús M. de la Cruz Abstract—This paper outlines a method for solving the stereo- vision matching problem using edge segments as the primitives. In stereovision matching, the following constraints are commonly used: epipolar, similarity, smoothness, ordering, and uniqueness. We propose a new strategy in which such constraints are se- quentially combined. The goal is to achieve high performance in terms of correct matches by combining several strategies. The contributions of this paper are reflected in the development of a similarity measure through a support vector machines classifica- tion approach; the transformation of the smoothness, ordering and epipolar constraints into the form of an energy function, through an optimization simulated annealing approach, whose minimum value corresponds to a good matching solution and by introducing specific conditions to overcome the violation of the smoothness and ordering constraints. The performance of the proposed method is illustrated by comparative analysis against some recent global matching methods. Index Terms—Epipolar, matching, ordering, similarity, sim- ulated annealing, smoothness, stereovision, support vector machines, uniqueness. I. INTRODUCTION A MAJOR PORTION of the research efforts of the com- puter vision community has been directed toward the study of the three-dimensional (3-D) structure of objects using machine analysis of images [1]. According to [2], we can view the problem of stereo analysis as consisting of the following steps: image acquisition, camera modeling, feature acquisition, image matching, depth determination, and interpolation. The key step is that of image matching, that is, the process of identifying the corresponding points in two images that are cast by the same physical point in 3-D space. This paper is devoted solely to this problem. The basic principle involved in the recovery of depth using passive imaging is triangulation, which is achieved with the help of only the existing environmental illumination. Hence, a cor- respondence needs to be established between features from two images that correspond to some physical feature in space. Then, provided that the position of centers of projection, the focal length, the orientation of the optical axis, and the sampling in- terval of each camera are known, the depth can be established by triangulation. Manuscript received August 28, 2003; revised February 29, 2004. This work was supported in part under projects CICYT DPI2002-02924 and CICYT TAP94-0832-C02-01. This paper was recommended by Associate Editor X. Jiang. The authors are with the Departmento Arquitectura de Computadores y Au- tomática, Facultades de Informática y Físicas, Universidad Complutense, 28040 Madrid, Spain (e-mail: pajares@dacya.ucm.es). Digital Object Identifier 10.1109/TSMCB.2004.827391 A. Techniques in Stereovision Matching A review of the state-of-art in stereovision matching allows us to distinguish two sorts of techniques broadly used in this discipline: area-based and feature-based [3], [4]. Area-based stereo techniques use correlation between brightness (intensity) patterns in the local neighborhood of a pixel in one image with brightness patterns in the local neighborhood of the other image [5]–[7], where the number of possible matches is intrinsically high, while feature-based methods use sets of pixels with similar attributes, normally, either pixels belonging to edges [8]–[10], or the corresponding edges themselves [4], [11]–[17]. These latter methods lead to a sparse depth map only, leaving the rest of the surface to be reconstructed by interpolation; but they are faster than area-based methods as there are a small number of features to be considered. We select a feature-based method with edge-segments as features, as they are abundant in the environ- ment where our mobile robot equipped with the stereovision system navigates. They have been studied in terms of reliability [3] and robustness [18]. B. Constraints Applied in Stereovision Matching Our stereo correspondence problem can be defined in terms of finding pairs of true matches, namely, pairs of edge segments in two images that are generated by the same physical edge seg- ment in space. These true matches generally satisfy some of the following constraints [6], [8], [10]: 1) epipolar, given two segments one in the left image and a second in the right one, if we slide one of them along a parallel direction to the epipolar line, they would intersect (overlap) (Fig. 1); 2) similarity, matched edge segments have similar local properties or attributes; 3) smoothness, disparity values in a given neighborhood change smoothly, except at a few depth discontinuities; 4) ordering, the relative position among two edge-segments in the left image is preserved in the right one for the cor- responding matches; 5) uniqueness, each edge-segment in one image should be matched to a unique edge-segment in the other image. The similarity and uniqueness constraints are associated to a local matching process, the smoothness and ordering constraints to a global matching process, and the epipolar is with both, local and global processes. The major difficulty of stereo processing arises due to the need to make global correspondences. 1083-4419/04$20.00 © 2004 IEEE