IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 3, MAY 1984 Correspondencee Visual and Conceptual Hierarchy: A Paradigm for Studies of Automated Generation of Recognition Strategies DAVID A. ROSENTHAL AND R. BAJCSY Abstract-The purpose of this correspondence is to show the design considerations in the choice of mechanisms when a flexible query- system of visual scenes is being constructed. More concretely, the issues are: * flexibility in adding new information to the knowledge base; * power of inferencing; * avoiding unnecessary generation of hypotheses where a great deal of image processing has to be perfected in order to test it; * having the power of automatic generation of recognition strategies. Index Terms-Automatic generation of recognition strategies, com- puter vision, rule based system. I. INTRODUCTION In the past most of the research in computer vision has con- centrated on the development of image processing routines, such as finding edges and regions, outlines, shapes, and their descriptions. This process is usually called feature extraction and/or image (scene) segmentation and its main purpose is to reduce the data while preserving the most characteristic in- variant or semi-invariant features. The recognition process then takes these features and descriptions and matches them to some Model(s). This process by and large is explicitly em- bedded in the recognition program and frequently very depen- dent on the application. Recently, there have been attempts to make this process more systematic and at the same time flexible [2], [3], [6]. Rule-based systems, in particular, have turned out to be very useful for this purpose. The goal, however, should be to dis- cover the right structure of the rules which in turn will allow an automatic data and/or query driven generation of recogni- tion strategies. This is exactly the subject of this paper. We start with a query which restricts the areas of analysis to highly probable places as well as suggesting a range of expected sizes and shapes. In turn, the size expectation is translated into the necessary spatial resolution which will make the searched object visible. This naturally will reduce the number of pixels used in the analysis. From there on the strategy is data driven, i.e., depending on the local contrast, noise, continuity of bound- aries, etc. The domain chosen is aerial images of an urban area (Washing- ton, DC). The rational behind this choice was that: 1) the aerial image is an orthogonal projection of a three- dimensional world without any perspective distortion; hence the image processing part will be simplified, 2) the urban area as opposed to the nonurban area was chosen because of the available regular shapes in urban areas which in turn are used as features during the recognition process. Currently, the system is capable of locating different types of motor vehicles, various street markings, buildings, objects which appear on the tops of buildings, and other objects. Manuscript received February 4, 1983; revised June 16, 1983. This work was supported by the National Science Foundation under Grant MCS-8207294. The authors are with the Department of Computer Science and In- formation Science, University of Pennsylvania, Philadelphia, PA 19104. In our earlier paper [1 ], we have described the motivation behind the conceptual and visual hierarchy. Our notion of con- ceptual hierarchies is that visual concepts or objects can be ordered with respect to some criteria, such as part-whole rela- tionships, class inclusion, contextual relations, etc. Visual hierarchy has received a fair amount of attention in image processing in the past [4], [8], [9], [14]. It has been called the "pyramid" data structure or processing cone. In practice, a visual hierarchy is a set of different sized images of the original input image, usually organized in descending or ascending order with respect to the spatial resolution. It is the interaction between the conceptual hierarchy and visual hierarchy which is important in this paper and this interaction produces a recognition strategy in response to a query. The kernel of the paper is divided into three parts: Section II: the formalism for knowledge representation, Section III: the implementation, Section IV: the results. The overall scheme is in Fig. 1. II. THE FORMALISM FOR KNOWLEDGE REPRESENTATION There have been various attempts to develop a formal struc- ture for representation of knowledge of various Al systems. Among the more successful ones are: production systems use of first-order predicate calculus procedural systems state-space representation, and others. Each of these systems has been used in the pure classical form as in various extensions required by the application problems. In computer vision, we can formulate the follow- ing requirements. 1) The flexibility and consistency of a knowledge representa- tion scheme to which one can add or delete needed pieces of knowledge in a systematic fashion. 2) A facility for representing factual information (predicates, functions or relations) and procedures which could extract in- formation on request from the sensory data. 3) An inferencing scheme, which could generate recognition strategies. It is not our purpose to evaluate each of these systems. Rather, we wish to report our experience with one of them, e.g., with the production system. A. Production System Production systems are a general class of computation mecha- nisms. They have been applied to a variety of problems. Al- though there is no standard production system, there are some unifying themes that embody each implementation (see [5] for an excellent overview). A production system is composed of three elements: a data- base, a set of rules, and an interpreter. Implementations vary in the performance of the interpreter and the structure of the rules. The production system that we are using was initially devel- oped as a tool for natural language processing [7]. It was later used by Sloan [13] for sequencing operations in a computer vision experiment. Before we describe in detail each element of our production system we need to explain the notion of 0162/8828/84/0500-0319$01.00 © 1984 IEEE 319