ADAPTIVE OBJECT IDENTIFICATION AND RECOGNITION USING NEURAL
NETWORKS AND SURFACE SIGNATURES
Sameh M. Yamany
System and Biomedical Eng.
Cairo University, Egypt
email: yamany@ieee.org
Aly A. Farag
Computer and Electrical Eng.
University of Louisville KY, USA
email: farag@cvip.uofl.edu
Abstract
This paper introduces an adaptive technique for 3D object
identification and recognition in 3D scanned scenes. This
technique uses neural learning of the 3D free-form surface
representation of the object in study. This representation
scheme captures the 3D curvature information of any free-
form surface and encodes it into a 2D image corresponding
to a certain point on the surface. This image represents a
”Surface Signature” because it is unique for this point and
is independent from the object translation or orientation in
space.
1. Introduction
The registration process is an integral part of computer
and robot vision systems and still presents a topic of high
interest in both fields. The importance of the registration
problem in general comes from the fact that it is found in
different applications including surface matching[1], 3D med-
ical imaging[2, 3], pose estimation[4], object recognition[5,
6, 7] and data fusion[8, 9].
In order for any surface registration algorithm to per-
form accurately and efficiently, appropriate representation
scheme for the surface is needed. Most of the surface rep-
resentation schemes found in literature have adopted some
form of shape parameterization especially for the purpose
of object recognition. However, free-form surfaces, in gen-
eral, may not have simple volumetric shapes that can be ex-
pressed in terms of parametric primitives. Dorai and Jain[5]
have defined a free-from surface to be “a smooth surface,
such that the surface normal is well defined and continu-
ous almost everywhere, except at vertices, edges and cusps.”
Discontinuities in the surface normal or curvature, and con-
sequently in the surface depth, may be present anywhere
in a free-from surface. Some representation schemes for
free-from surfaces found in literature include the splash rep-
resentation proposed by Stein and Medioni[10], the point
signature by Chua and Jarvis[11], COSMOS by Dorai and
Jain[5] and recently the spin image by Johnson and Hebert[7].
All of these representations are claimed to be invariant to
rigid transformation but most of these representations fall
under the local surface representation class which is known
to be sensitive to noise in the surface and to the feature ex-
traction process in general. In this paper, we use a general
representation scheme that is (1) invariant to rigid transfor-
mation, (2) can be used as a global representation of the
surface as well as a local one, (3) can be used in recognition
of multiple objects in a scene with/without occlusion, and
finally (4) performs faster registration than existing regis-
tration approaches.[12]
The idea starts by identifying special points on the model
surface. These points are called Important points due to
the information they carry. Then an image, for each im-
portant point, capturing the surface curvature information
seen from this point is formed. This image is unique for
this point and is independent from the object translation or
orientation in space. For this reason we called this image
Surface Point Signature (SPS). Object recognition is then
performed by matching SPS images of different library ob-
jects and hence finding a high score of corresponding points
in the correct object. A neural network configuration is used
in the matching where the whole SPS image acts as an in-
put while the desired response would be the (x,y,z) coordi-
nates of the point at which this SPS image was generated.
The training procedure will start by constructing an input-
output map using many SPS images for the model object.
At run time, the SPS images of the scene object is given to
the network which in turn would return the closest (x,y,z)
coordinates of a point on the model object. Using three
such correspondences, the transformation parameters could
be recovered. As the size of the SPS image would be too
large which makes the training time very slow, we propose
to use the horizontal and vertical projections as the inputs to
the neural network rather than the whole image itself. This
of course will reduce the accuracy of the correspondence,
however this reduction is treated by increasing the learning
samples.
Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS’03)
0-7695-1971 3 $17.00 © 2003 IEEE