IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 3, MAY 1984
Correspondencee
Visual and Conceptual Hierarchy: A Paradigm for Studies
of Automated Generation of Recognition Strategies
DAVID A. ROSENTHAL AND R. BAJCSY
Abstract-The purpose of this correspondence is to show the design
considerations in the choice of mechanisms when a flexible query-
system of visual scenes is being constructed. More concretely, the
issues are:
* flexibility in adding new information to the knowledge base;
* power of inferencing;
* avoiding unnecessary generation of hypotheses where a great deal
of image processing has to be perfected in order to test it;
* having the power of automatic generation of recognition strategies.
Index Terms-Automatic generation of recognition strategies, com-
puter vision, rule based system.
I. INTRODUCTION
In the past most of the research in computer vision has con-
centrated on the development of image processing routines,
such as finding edges and
regions, outlines, shapes, and their
descriptions. This process is usually called feature extraction
and/or image (scene) segmentation and its main purpose is to
reduce the data while preserving the most characteristic in-
variant or semi-invariant features. The recognition process
then takes these features and descriptions and matches them
to some Model(s). This process by and large is explicitly em-
bedded in the recognition program and frequently very depen-
dent on the application.
Recently, there have been attempts to make this process
more systematic and at the same time flexible [2], [3], [6].
Rule-based systems, in particular, have turned out to be very
useful for this purpose. The goal, however, should be to dis-
cover the right structure of the rules which in turn will allow
an automatic data and/or query driven generation of recogni-
tion strategies. This is exactly the subject of this paper.
We start with a query which restricts the areas of analysis to
highly probable places as well as suggesting a range of expected
sizes and shapes. In turn, the size expectation is translated into
the necessary spatial resolution which will make the searched
object visible. This naturally will reduce the number of pixels
used in the analysis. From there on the strategy is data driven,
i.e., depending on the local contrast, noise, continuity of bound-
aries, etc.
The domain chosen is aerial images of an urban area (Washing-
ton, DC). The rational behind this choice was that:
1) the aerial image is an orthogonal projection of a three-
dimensional world without any perspective distortion; hence
the image processing part will be simplified,
2) the urban area as opposed to the nonurban area was chosen
because of the available regular shapes in urban areas which in
turn are used as features during the recognition process.
Currently, the system is capable of locating different types
of motor vehicles, various street markings, buildings, objects
which appear on the tops of buildings, and other objects.
Manuscript received February 4, 1983; revised June 16, 1983. This
work was supported by the National Science Foundation under Grant
MCS-8207294.
The authors are with the Department of Computer Science and In-
formation Science, University of Pennsylvania, Philadelphia, PA 19104.
In our earlier paper [1 ], we have described the motivation
behind the conceptual and visual hierarchy. Our notion of con-
ceptual hierarchies is that visual concepts or objects can be
ordered with respect to some criteria, such as part-whole rela-
tionships, class inclusion, contextual relations, etc.
Visual hierarchy has received a fair amount of attention in
image processing in the past [4], [8], [9], [14]. It has been
called the "pyramid" data structure or processing cone. In
practice, a visual hierarchy is a set of different sized images of
the original input image, usually organized in descending or
ascending order with respect to the spatial resolution.
It is the interaction between the conceptual hierarchy and
visual hierarchy which is important in this paper and this
interaction produces a recognition strategy in response to a
query.
The kernel of the paper is divided into three parts:
Section II: the formalism for knowledge representation,
Section III: the implementation,
Section IV: the results.
The overall scheme is in Fig. 1.
II. THE FORMALISM FOR KNOWLEDGE REPRESENTATION
There have been various attempts to develop a formal struc-
ture for representation of knowledge of various Al systems.
Among the more successful ones are:
production systems
use of first-order predicate calculus
procedural systems
state-space representation, and others.
Each of these systems has been used in the pure classical
form as in various extensions required by the application
problems. In computer vision, we can formulate the follow-
ing requirements.
1) The flexibility and consistency of a knowledge representa-
tion scheme to which one can add or delete needed pieces of
knowledge in a systematic fashion.
2) A facility for representing factual information (predicates,
functions or relations) and procedures which could extract in-
formation on request from the sensory data.
3) An inferencing scheme, which could generate recognition
strategies.
It is not our purpose to evaluate each of these systems. Rather,
we wish to report our experience with one of them, e.g., with
the production system.
A. Production System
Production systems are a general class of computation mecha-
nisms. They have been applied to a variety of problems. Al-
though there is no standard production system, there are some
unifying themes that embody each implementation (see [5]
for an excellent overview).
A production system is composed of three elements: a data-
base, a set of rules, and an interpreter. Implementations vary
in the performance of the interpreter and the structure of the
rules.
The production system that we are using was initially devel-
oped as a tool for natural language processing [7]. It was later
used by Sloan [13] for sequencing operations in a computer
vision experiment. Before we describe in detail each element
of our production system we need to explain the notion of
0162/8828/84/0500-0319$01.00
©
1984 IEEE
319