1646 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 34, NO. 4, AUGUST 2004
On Combining Support Vector Machines and
Simulated Annealing in Stereovision Matching
Gonzalo Pajares and Jesús M. de la Cruz
Abstract—This paper outlines a method for solving the stereo-
vision matching problem using edge segments as the primitives.
In stereovision matching, the following constraints are commonly
used: epipolar, similarity, smoothness, ordering, and uniqueness.
We propose a new strategy in which such constraints are se-
quentially combined. The goal is to achieve high performance in
terms of correct matches by combining several strategies. The
contributions of this paper are reflected in the development of a
similarity measure through a support vector machines classifica-
tion approach; the transformation of the smoothness, ordering
and epipolar constraints into the form of an energy function,
through an optimization simulated annealing approach, whose
minimum value corresponds to a good matching solution and by
introducing specific conditions to overcome the violation of the
smoothness and ordering constraints. The performance of the
proposed method is illustrated by comparative analysis against
some recent global matching methods.
Index Terms—Epipolar, matching, ordering, similarity, sim-
ulated annealing, smoothness, stereovision, support vector
machines, uniqueness.
I. INTRODUCTION
A
MAJOR PORTION of the research efforts of the com-
puter vision community has been directed toward the
study of the three-dimensional (3-D) structure of objects using
machine analysis of images [1]. According to [2], we can view
the problem of stereo analysis as consisting of the following
steps: image acquisition, camera modeling, feature acquisition,
image matching, depth determination, and interpolation. The
key step is that of image matching, that is, the process of
identifying the corresponding points in two images that are cast
by the same physical point in 3-D space. This paper is devoted
solely to this problem.
The basic principle involved in the recovery of depth using
passive imaging is triangulation, which is achieved with the help
of only the existing environmental illumination. Hence, a cor-
respondence needs to be established between features from two
images that correspond to some physical feature in space. Then,
provided that the position of centers of projection, the focal
length, the orientation of the optical axis, and the sampling in-
terval of each camera are known, the depth can be established
by triangulation.
Manuscript received August 28, 2003; revised February 29, 2004. This
work was supported in part under projects CICYT DPI2002-02924 and
CICYT TAP94-0832-C02-01. This paper was recommended by Associate
Editor X. Jiang.
The authors are with the Departmento Arquitectura de Computadores y Au-
tomática, Facultades de Informática y Físicas, Universidad Complutense, 28040
Madrid, Spain (e-mail: pajares@dacya.ucm.es).
Digital Object Identifier 10.1109/TSMCB.2004.827391
A. Techniques in Stereovision Matching
A review of the state-of-art in stereovision matching allows
us to distinguish two sorts of techniques broadly used in this
discipline: area-based and feature-based [3], [4]. Area-based
stereo techniques use correlation between brightness (intensity)
patterns in the local neighborhood of a pixel in one image with
brightness patterns in the local neighborhood of the other image
[5]–[7], where the number of possible matches is intrinsically
high, while feature-based methods use sets of pixels with similar
attributes, normally, either pixels belonging to edges [8]–[10],
or the corresponding edges themselves [4], [11]–[17]. These
latter methods lead to a sparse depth map only, leaving the rest
of the surface to be reconstructed by interpolation; but they are
faster than area-based methods as there are a small number of
features to be considered. We select a feature-based method with
edge-segments as features, as they are abundant in the environ-
ment where our mobile robot equipped with the stereovision
system navigates. They have been studied in terms of reliability
[3] and robustness [18].
B. Constraints Applied in Stereovision Matching
Our stereo correspondence problem can be defined in terms
of finding pairs of true matches, namely, pairs of edge segments
in two images that are generated by the same physical edge seg-
ment in space. These true matches generally satisfy some of the
following constraints [6], [8], [10]:
1) epipolar, given two segments one in the left image and a
second in the right one, if we slide one of them along a
parallel direction to the epipolar line, they would intersect
(overlap) (Fig. 1);
2) similarity, matched edge segments have similar local
properties or attributes;
3) smoothness, disparity values in a given neighborhood
change smoothly, except at a few depth discontinuities;
4) ordering, the relative position among two edge-segments
in the left image is preserved in the right one for the cor-
responding matches;
5) uniqueness, each edge-segment in one image should be
matched to a unique edge-segment in the other image.
The similarity and uniqueness constraints are associated to a
local matching process, the smoothness and ordering constraints
to a global matching process, and the epipolar is with both, local
and global processes. The major difficulty of stereo processing
arises due to the need to make global correspondences.
1083-4419/04$20.00 © 2004 IEEE