Robust Feature Selection for Object Recognition using Uncertain 2D Image Data* Tarak L. Gandhi zyxwvu gandhi@cse.psu. edu Dept. of Computer Science and Engineering The Pennsylvania State University ‘University Park, PA 16802 zyxwvutsrq Abstract The. use. of a zyxwvutsrqpo small set of features is recurrent in the object recognition literature. If the image data is perfect with no sensor uncertainty and there are not incorrect feature correspondences between the model and the image, then the pose of the object can be com- puted with no error using these few correspondences. However, in most real cases the noise in the data will propagate into the pose. Moreover, the extent of the effect of the uncertainty will depend on the selection of the correspondences used to compute it. In this paper we address the problem of how to select these corre- spondence zyxwvutsrq so that the effect of the data uncertainty on the pose estimation is minimised. 1 Introduction Most model-based computer vision systems attempt to recognize and locate 3D objects from a 2D image of a scene by pairing features from a set of stored models with features extracted from the image. These corre- spondences are found using techniques such as inter- pretation trees [lo, 7, 21, hashing [27, 5, 9, 261, align- ment [17], bipartite search [20], and automated pro- gramming [l]. The pairings are such that the features in the image can be obtained (approximately) by ap- plying a geometric transformation to their correspond- ing model features. This transformation is usually re- ferred as the pose of the object, that is the position of the object with respect to a coordinate system. Most methods to compute the pose use a few point-to-point [14, 6, 12, 111 or line-to-line [21, 221 correspondences. If the data is perfect with no sensor uncertainty and with no incorrect correspondences, then the pose is exact, and the transformed model features exactly co- incide with the image features. However, in most real cases the noise in the data will propagate into the pose. Moreover, the extent of the effect of the uncertainty depends on the correspondences used to compute it. In particular, an important issue that affects the performance of a recognition system is the fact that the accuracy of a pose computed using a small num- ber of correspondences can be very different depending on which correspondences are selected, even when the same number of correspondences is used. Recently, Grimson et zyxwvutsrq a1 [12] presented a detailed study of how sensor uncertainty in the data propa- ‘This work was supported in part by NSF grant IRI9309100. 1063-6919/94 $3.00 zyxwvutsrqp 0 1994 IEEE 281 Octavia I. Camps campsOwhale.ece.psu.edu Dept. of Electrical Engineering The Pennsylvania State University University Park, PA 16802 gates into the pose when it is computed using three point correspondences and the method given in [ll]. Furthermore, they used their results to analyze the ef- fects of sensor noise in the performance of systems that use feature alignment or hashing schemes to do ob- ject identification. However, at the present time there are no methodologies available to improve the perfor- mance of a recognition system by incorporating stud- ies such as [12] into the selection process of the feature correspondences. Thus, designers are forced to build recognition systems in an iterative fashion, trying dif- ferent feature selection heuristics until the desired level of performance is achieved. 2 Statement of the Problem The use of a small set of features is recurrent in the literature. Perceptual groupings was first suggested by Lowe [23]. Henikoff and Shapiro [15] defined interest- ing patterns formed by triplets of line segments and found that they were useful in reducing the number of hypothesized models. In the work by Mohan and Nevatia [24], the systems 3DP0 [16] and 3D-POLY [25] a few “local” or “kernel” set of features were used. Hansen [13] used a set of filters in order to reduce the number of features to be considered. Flynn [8] pro- posed the use of an utility measure of the features in order to reduce the number of hypothesis made by a hashing scheme. Ikeuchi and Kanade [18], Chen and Mulgaonkar [4], and Camps [3] used the concept of feature detectability to rank features in decreasing order of detectability. In spite of all this research activity, the selection of “good” features to be matched in object recognition re- mains difficult. In this paper we address the following problem: Let N be the number of model points and n zyxwvu 5 N be the number of points that are ac- tually wed to compute the poje of the object. Then, find the subset of n model points such that the efect of the data uncertainty in the estimation of the pose is minimized. 3 Definitions and Notation In photogrammetric terminology, the ezterior ori- entation of a camera is specified by all the parameters (three rotation angles and a translation vector) that