Planning Sensing Strategies in a Robot York Cell with Multi-Sensor Capabilities zyxw S. A. Hutchinson, R. zyxwv L. Cromwell, and A. C. Kak Robot Vision Laboratory School of Electrical Engineering Purdue University West Lafayette, IN zyxwvut 47907 ABSTRACT In this paper we present an approach to planning sensing strategies in a robot work cell with multi-sensor capabilities. The system first forms an initial set of object hypotheses by using one of the sensors. Subsequently, the system reasons over different possibilities for selecting the next sensing operation, this being done in a manner so as to maximally disambiguate the initial set of hypotheses. The “next sensing operation“ is characterized by both the choice of the sensor and the viewpoint to be used. Aspect graph representation of objects plays a central role in the selection of the viewpoint, these representations being derived automatically by a solid modelling program. 1. Introduction With current techniques in geometric modeling, it is possible to gen- erate object models with a large number of features and relationships between those features. Likewise, given the current state of computer vision (both 2D and 3D) and tactile sensing, it is possible to derive large feature sets from sensory data. Unfortunately, large feature sets can zyxwvutsrq also require exponential computational resources unless one takes advantage of the fact that most objects can be recognized by a few landmarks. The prob- lem then becomes one of developing computer procedures capable of analyzing geomztric models to yield the most discriminating feature sets. In solving this problem, one has to bear in mind that in the robotic cells of today we have available to us a variety of sensors, each capable of measur- ing a different attributeof the object. For it to be useful to robotic assembly, we need to add another dimen- sion to the problem as stated above. Say, we have a robot trying to deter- mine the identities of the objects in its work area. The robot should only invoke those sensory operations that are most relevant to the disambigua- tion of whatever hypotheses the robot might entertain about the identities of those objects. Therefore, the most discriminating features invoked by the robot must be determined at run time and, of course, must make maximum advantage of all the sensors that are available. If we limited ourselves to just vision sensing and if the run-time capa- bility was not important, problems of this sort have recently been solved by a number of researchers, most notably Ikeuchi [8] and Hanson and Hender- son 161. Ikeuchi’s work is based on the automatic synthesis of zyxwvutsrqp interpreta- tion trees which are used to guide feature selection. In Ikeuchi’s approach, the higher level nodes in the interpretation tree yield the aspect of the object, and then the lower level nodes are used for computing the precise pose of the object. This scheme makes use of the fact that for most objects the set of features useful for discriminating between aspects differs from the set of features useful for determining the exact pose once the aspect has been determined. In the work reported by Hanson and Henderson, a set of filters is used to select the best identifying features (based on rarity, robustness, cost, etc.) for each aspect. These features and their associated aspects are compiled into a strategy tree which, in purpose, is similar to Ikeuchi’s interpretation tree. The strategy tree has two levels. Each node at the first level allows aspect hypotheses to be invoked on the basis of certain features and their values. For each hypothesis at a first level node, there exists a Corroborat- ing Evidence Subwee, which is used to guide the search for evidence that supports that hypothesis and for carrying out the computations for geometric data for determining the object’s pose. The work that we present in this paper extends the above cited work by giving the system the ability to use multiple aspects and different sensors for the identification of an object and the computation of its pose. The sen- sory types currently incorporated in the system include a 3D range scanner, 2D overhead cameras, a manipulator held 2D camera, a Forcelrorque This work was supported by the Engineering Research Center for Intelligent Manufacturing at Purdue University, ‘and the industry supported Purdue CIDMAC program. * wrist-mounted sensor, and also the manipulator fingers for estimating the grasp width. These sensors can be used to examine objects from arbitrary viewpoints. zyxwvu Also, the manipulator and F/r sensor can be used to measure other features such as weight, depth of occluded holes in the object, etc. It is important to realize that with these additional sensory inputs, we can discriminate between object identities, aspects and poses. that would otherwise appear indistinguishable to just a fixed viewpoint vision-based system. Our system is capable of dynamic viewpoint selection if that’s what is needed for optimum disambiguation between the currently held hypotheses. We attack the problem of viewpoint and sensor-type selection as fol- lows. After observing the object from an initial viewpoint with, say, a vision sensor, a set of hypotheses is formulated about the object identity and pose. We then search for a viewpoint that will enable the system to observe features which will best discriminate between the competing hypotheses. This is possible because, for any active hypothesis, we can predict the feature set which would be observed from a candidate viewpoint with a candidate sensor if that hypothesis was correct. By doing this for each active hypothesis, we can determine the amount of ambiguity that would be resolved using that viewpoint-sensor combination. This is the crux of our approach. In the remainder of the paper, we will present our technique in some detail. In the next section, we will describe how object hypotheses are developed, and present a measure of ambiguity in a set of object hypotheses. Section 3 describes the method we use to predict the features that can be observed from a particular viewpoint. In Section 4, we describe the types of features that our system uses, and give a brief overview of how these features are derived from sensory data. In Section zyxw 5, we describe the algorithm that we have implemented to search through the space of viewpoints. The algorithm makes use of the object’s aspect graphs, the search space being comprised of the set of viewpoints corresponding to the nodes in the graph. In this section, we also discuss future work which will incorporate uncertainty into the algorithm. The paper concludes with a brief discussion of our experimental work. 2. Approach In order to illustrate our problem and the approach that we will take, we begin this section with a two dimensional example. The problem in the example consists of making sensory measurements for distinguishing between the two 2D objects shown in Fig. 1. We will assume that a sensory measurement yields the edges that are visible from a particular viewpoint, the length and the orientation of the measured edges being subject to exper- imental error. For the purpose of facilitating the explanation here, we have used the integers, 1, 2, 3 and 4, to denote the sides of one object, and the letters, a, b, c, d, e and f, to denote the sides of the other. Suppose that the first sensor reading that we obtain is the straight line segment (denoted by the label SI) observed from the viewpoint V, as illus- trated in Fig. 2a. This line segment could correspond to a number of possi- ble edges. Note that since there is some uncertainty in the edge extraction process, the length of the sensed edge could not be assumed to be abso- lutely accurate. Therefore, it is possible to map a measured edge to any model edge whose length is within some tolerance of the length of the measured edge. The possible assignments of model edges to the measured edge are illustrated in Fig. 2b. Note that if the system was constrained to -- -- ~ Fig. 1:Two 2-dimensional Fig. 2: (a) shows a sensed edge BS observed objects with edge labels. matching model edges. from V. (b) indicates the possible CH2555-1/88/0000/1068$01.~ zyxwvutsr 0 1988 IEEE 1068