ADAPTIVE OBJECT IDENTIFICATION AND RECOGNITION USING NEURAL NETWORKS AND SURFACE SIGNATURES Sameh M. Yamany System and Biomedical Eng. Cairo University, Egypt email: yamany@ieee.org Aly A. Farag Computer and Electrical Eng. University of Louisville KY, USA email: farag@cvip.uoﬂ.edu Abstract This paper introduces an adaptive technique for 3D object identiﬁcation and recognition in 3D scanned scenes. This technique uses neural learning of the 3D free-form surface representation of the object in study. This representation scheme captures the 3D curvature information of any free- form surface and encodes it into a 2D image corresponding to a certain point on the surface. This image represents a ”Surface Signature” because it is unique for this point and is independent from the object translation or orientation in space. 1. Introduction The registration process is an integral part of computer and robot vision systems and still presents a topic of high interest in both ﬁelds. The importance of the registration problem in general comes from the fact that it is found in different applications including surface matching[1], 3D med- ical imaging[2, 3], pose estimation[4], object recognition[5, 6, 7] and data fusion[8, 9]. In order for any surface registration algorithm to per- form accurately and efﬁciently, appropriate representation scheme for the surface is needed. Most of the surface rep- resentation schemes found in literature have adopted some form of shape parameterization especially for the purpose of object recognition. However, free-form surfaces, in gen- eral, may not have simple volumetric shapes that can be ex- pressed in terms of parametric primitives. Dorai and Jain[5] have deﬁned a free-from surface to be “a smooth surface, such that the surface normal is well deﬁned and continu- ous almost everywhere, except at vertices, edges and cusps.” Discontinuities in the surface normal or curvature, and con- sequently in the surface depth, may be present anywhere in a free-from surface. Some representation schemes for free-from surfaces found in literature include the splash rep- resentation proposed by Stein and Medioni[10], the point signature by Chua and Jarvis[11], COSMOS by Dorai and Jain[5] and recently the spin image by Johnson and Hebert[7]. All of these representations are claimed to be invariant to rigid transformation but most of these representations fall under the local surface representation class which is known to be sensitive to noise in the surface and to the feature ex- traction process in general. In this paper, we use a general representation scheme that is (1) invariant to rigid transfor- mation, (2) can be used as a global representation of the surface as well as a local one, (3) can be used in recognition of multiple objects in a scene with/without occlusion, and ﬁnally (4) performs faster registration than existing regis- tration approaches.[12] The idea starts by identifying special points on the model surface. These points are called Important points due to the information they carry. Then an image, for each im- portant point, capturing the surface curvature information seen from this point is formed. This image is unique for this point and is independent from the object translation or orientation in space. For this reason we called this image Surface Point Signature (SPS). Object recognition is then performed by matching SPS images of different library ob- jects and hence ﬁnding a high score of corresponding points in the correct object. A neural network conﬁguration is used in the matching where the whole SPS image acts as an in- put while the desired response would be the (x,y,z) coordi- nates of the point at which this SPS image was generated. The training procedure will start by constructing an input- output map using many SPS images for the model object. At run time, the SPS images of the scene object is given to the network which in turn would return the closest (x,y,z) coordinates of a point on the model object. Using three such correspondences, the transformation parameters could be recovered. As the size of the SPS image would be too large which makes the training time very slow, we propose to use the horizontal and vertical projections as the inputs to the neural network rather than the whole image itself. This of course will reduce the accuracy of the correspondence, however this reduction is treated by increasing the learning samples. Proceedings of the IEEE Conference on Advanced Video and Signal Based Surveillance (AVSS’03) 0-7695-1971 3 $17.00 © 2003 IEEE